Data – big or small – has tremendous potential for use (and misuse).  For example, using mobile apps to keep track of one’s own physical activity or caloric intake may empower individuals to improve their health.  Should other parties (e.g., that app’s developer, physician, employer, insurance company, online friends) be able to access the same information, and if so, under what conditions? As another example, expressing one’s own feelings and preferences on a social media platform may strengthen bonds within a professional community or a family group, expedite academic collaborations, and/or improve an individual’s sense of belonging.  However, may those same messages – freely expressed in a public domain – be re-purposed for a study of mental health trends or for marketing strategies; and if so – when/how/by whom, or why/why-not?  Questions like these touch on a host of ethical and legal issues that only recently began to be explored in depth, even as new norms of individual behavior, human interactions, and treatment of data are evolving.     

“Pervasive” data – a term used by data and social scientists to describe data left by users of social platforms and smart devices – is being utilized by a variety of stakeholders, but without any consensus on how or whether to seek users’ consent to such usage.  Some of the data are generated by human users intentionally (e.g., online comments), while other types of data are produced as a “by-product” of human activity (e.g., browser search history).  So how should this data be treated? Is capturing pervasive data a type of observational research conducted in a public space? Or are users who generate pervasive data human subjects? Is the use of pervasive data akin to a secondary analysis of established data? Is analyzing word choices in a work of literature different from analyzing online commentary?  The underlying fundamental questions – among many others – are what researchers at project PERVADE are examining.  The principal investigators (PIs) of PERVADE work across disciplines, come from several academic institutions, and are uniquely pursuing these questions in the social-science space.  The project started in the Fall of 2017 with the funding of the National Science Foundation, in response to a need for ethical clarity perceived by the growing community of social-science researchers relying on pervasive data.

According to PI Katie Shilton (Associate Professor, University of Maryland-College Park), PERVADE’s various subprojects and empirical outcomes could generate a fresh set of ethical guidelines or educational materials that would clarify for data scientists and others utilizing pervasive data how to ethically study people’s “digital footprints.” A “consensus code of ethics” could also subsequently stir up legal and regulatory activity, as was the case with the Common Rule.       

To accomplish its goals, PERVADE set out to explore several areas or subprojects, including:

  • The types of new data forms and types of data uses in scientific research.
  • Users’ attitudes towards their online and smart-device data being used in research.
  • Current regulatory structures and the extent to which it may be applicable to “digital research”.
  • Existing and emerging approaches to handling ethical questions by academia and various industry participants (e.g., online platform providers, smart device manufacturers, etc.).
  • Current corporate practices of collecting and using personal data.
  • The challenges of translating the legal complexities of online spaces for laymen users.

Some ethical principles and legal and regulatory guidelines already exist and somewhat apply to the challenges that pervasive data creates, but these resources are largely silent on this particular matter.  For example, in an interview with DBRonData, Nadia N. Sawicki (Professor of Law, Loyola University Chicago School of Law, not affiliated with PERVADE), pointed out that in the US, for health-related data, the Health Insurance Portability and Accountability Act (HIPAA), or the Institutional Review Boards (IRBs), or the more general US Federal Common Rule may offer some models for social-science researchers, but they are not broad enough in the types of data covered, and not detailed enough with respect to pervasive data.  Similarly, the forthcoming European General Data Protection Regulation (GDPR), which will apply to the data of European users, may be overly broad and will need to be interpreted for specific practical cases. Sawicki further commented that users’ own opinions on the use of pervasive data is a rich, original source of information that has seemingly gone untapped.

In a separate interview, Jordan Paradise (Professor of Law, Loyola University Chicago School of Law, not affiliated with PERVADE), expressed her view that the PERVADE project is “very timely as much of the ethical/legal/societal questions have gone unexamined”, even though the questions of ethics may have been addressed in the past within narrower fields, such as the Ethical, Legal and Societal Implications (ELSI) of the Human Genome Project, or the Neuroethics Division of the BRAIN initiative, or the Former Bioethics Commissions.

With the rise of pervasive data, the ethics of using ‘’big data’’ from digital health applications became the focus of a more recent network Connected and Open Research Ethics (CORE) hosted by the University of San Diego.  There are also academic groups looking at the ethics of using ‘‘small data’’ – i.e., data unintentionally left by users, and not limited to the health data – such as the Small Data Lab @ Cornell Tech.

All of the researchers interviewed by DBRonData stressed the value of a broad-based collaboration to explore the ethical issues, and saw much value in PERVADE and other communal efforts to develop a well-thought-out framework for ethical uses of digital information.