MSR Faculty Summit 2014 Ethics Panel Recap

[Cross-posted to the Social Media Collective]

When the Facebook Emotions Study first made international news, I felt strongly (still do) that researchers, from those honing algorithms to people like me studying the social impact of media and technologies, need to come together. There are no easy answers or obvious courses of action. But we all have a stake in understanding the ethical implications of studying social media as equal parts data analysis and human subjects research. And we need common ground.

At the end of the day, researchers are also well-positioned to change things for two simple reasons: 1) individual researchers design and execute research and data analysis for both corporations and universities. If we change how we do things, our institutions will follow suit. 2) Today’s social media researchers and corporate data scientists will mentor and train the next generation of data researchers. Our students will continue and advance the exploration of social media data at jobs based in industry and university settings. The ethical principles that they learn from us will define not only the future of this field but the general public’s relationship to it. But it’s not easy to bring together such a wide range of researchers. Social media researchers and data scientists are rarely all in the same place.

As luck would have it, Microsoft Research’s Faculty Summit, held annually on the MSR Redmond campus in the great state of Washington, USA, gathers just such a mixed scholarly audience. It was scheduled for July 14-15, a mere two weeks into the public fallout over the Study. Through the support of Microsoft Research and MSR’s Faculty Summit organizers, we organized an ad-hoc session for July 14, 2014, 11:30a-12:30p PT, entitled “When Data Science & Human Subject Research Collide: Ethics, Implications, Responsibilities.” Jeff Hancock, co-author of the Facebook Emotions Study, generously agreed to participate in the discussion. I scoured the list of Faculty Summit attendees and found three other participants to round out the conversation: Jeffrey Bigham, Amy Bruckman, and Christian Sandvig. These scholars (their bios are below) offer the expertise and range of perspectives we need to think through what to do next.

Below, you will find a transcript of the brief panel presentations and a long, long list of excellent questions generated by the more than 100 attendees. I have anonymized the sources of the questions, but if you contact me and would like your name attached to your comment or question, please let me know and I’ll edit this document.

I asked that the session not be recorded for public circulation because I wanted all those present to feel completely free to speak their minds. I encouraged everyone to “think before they tweet” which did not bar social media reports from the event (but, I was delighted to see how many of us focused on each other rather than our screens). We agreed early on that the best contribution we could, collectively, make was to generate questions rather than presume anyone had the answers. I hope that you find this document helpful as you work through your own thoughts on these issues. My thanks to MSR and the Faculty Summit organizers (particularly Jaya, who was so patient with the ever-changing details), the panelists for their participation, to the audience for their collegiality and kindness, and a special shout out to Liz Lawley for sharing her notes with me.

Sincerely,

Mary L. Gray

Session title: When Data Science & Human Subject Research Collide: Ethics, Implications, Responsibilities

Chair: Mary L. Gray, Microsoft Research

Abstract: Join us for a conversation to reflect on the ethics, implications, and responsibilities of social media research, in the wake of the Facebook emotion study. What obligations must researchers consider when studying human interaction online? When does data science become human subjects research? What can we learn as a collective from the public’s reaction to Facebook’s recent research as well as reflection on our own work? Mary L. Gray (Microsoft Research) and Jeff Hancock (Cornell University and co-author of the Facebook emotion study), will facilitate a panel discussion among researchers based at Microsoft Research and across academia from the fields of data science, computational social science, qualitative social science, and computer science.

Panel expertise:

– anthropology

– communication studies

– data science

– experimental research design

– HCI

– human computation

– information sciences

– social psychology

– usability studies

Each panelist had 5 minutes to reflect on:

What can we learn?
Where do we go from here?
What is one BURNING QUESTION we should address together?

House rules:

think B4 you tweet
not a “gotcha!” session
step up/step back (if you tend to talk a lot, let someone else take the mic first)

BIOs:

Christian Sandvig—Speaker 1 (able to speak from an Information Sciences perspective)

Associate Professor of Information, School of Information, Faculty Associate, Center for Political Studies, ISR and Associate Professor of Communication, College of Literature, Science, and the Arts. Sandvig is a faculty member at the School of Information specializing in the design and implications of Internet infrastructure and social computing. He is also a Faculty Associate at the Berkman Center for Internet & Society at Harvard University. Before moving to Michigan, Sandvig taught at the University of Illinois at Urbana-Champaign and Oxford University. Sandvig’s research has appeared in The Economist, The New York Times, The Associated Press, National Public Radio, CBS News, and The Huffington Post. His work has been funded by the National Science Foundation, the MacArthur Foundation, and the Social Science Research Council. He has consulted for Intel, Microsoft, and the San Francisco Public Library. Sandvig received his Ph.D. in Communication Research from Stanford University in 2002. https://www.si.umich.edu/people/christian-sandvig

Jeffrey P. Bigham—Speaker 2 (able to speak from a computer science/accessible technologies perspective)

Associate Professor of the Human-Computer Interaction Institute and Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. Jeffrey’s work sits at the intersection of human-computer interaction, human computation, and artificial intelligence, with a focus on developing innovative technology that serves people with disabilities in their everyday lives. Jeffrey received his B.S.E degree in Computer Science from Princeton University in 2003. He received his M.Sc. degree in 2005 and his Ph.D. in 2009, both in Computer Science and Engineering from the University of Washington. http://www.cs.cmu.edu/~jbigham/

Amy Bruckman—Speaker 3 (able to speak from the builder/designer perspective)

Professor in the School of Interactive Computing in the College of Computing at Georgia Tech, and a member of the Graphics, Visualization, and Usability (GVU) Center. She received her PhD from the Epistemology and Learning Group at the MIT Media Lab in 1997, and a BA in physics from Harvard University in 1987. She does research on online communities and education, and is the founder of the Electronic Learning Communities (ELC) research group. Bruckman studies how people collaborate to create content online, with a focus on how the Internet can support constructionist, project-based learning. Her newer work focuses on the products of online collaboration as ends in themselves. How do we support people in this creative process, and what new kinds of collaborations might be possible? How do interaction patterns shape the final product? How do software features shape interaction patterns? How does Wikipedia really work, and why do people contribute to it? http://www.cc.gatech.edu/fac/Amy.Bruckman/

Jeff Hancock—Speaker 4 (co-author of the Facebook Emotions Study)

Dr. Jeffrey T. Hancock is a Professor in the Communications and Information Science departments at Cornell University and is the Co-Chair of the Information Science department. He is interested in social interactions mediated by information and communication technology, with an emphasis on how people produce and understand language in these contexts. His research has focused on two types of language, verbal irony and deception, and on a number of cognitive and social psychological factors affected by online communication. https://communication.cals.cornell.edu/people/jeffrey-hancock

Opening remarks (Mary L. Gray):

I asked each of our speakers to introduce themselves, tell us a little bit about the perspective they’re coming from. The goal of the panel was to bring together as many different disciplinary perspectives as possible among people who are studying what is perhaps best understood as a shared object: social media. We came together to think about the implications and ramifications of the public response to the Facebook study. I gave a special shoutout and thanks to Jeff Hancock for being willing to attend Faculty Summit at the very last minute. I want to publicly say how impressed I am by his collegiality and his willingness to engage. I think we are so lucky that this is the case that became the opportunity for us to talk about this. I think all of us researching social media can imagine really bad cases that could have come to light and instantly eroded public trust in our efforts to understand social media. So I’m really very happy that this opportunity to talk about how to move forward in our research was prompted by the work of a scholar who I really respect and admire. So with that, I handed it off to our first speaker, Christian Sandvig. Each person spoke for a little bit and then we had a chance for them to pose one burning question.

Panelist statements:

CHRISTIAN SANDVIG:

Thanks, Mary. Mary asked us to say a little bit where we might relate to this topic. I’m a Professor at the School of Information and the Communications Studies Department at the University of Michigan. I’m interested in information and public policy. I’m interested in this particular controversy because I have a forthcoming book about studying human behavior online. I’ve taught about applied ethics and research methods. I have a graduate class called Unorthodox Research Methods, about new research methods and the controversies they provoke. And I’m a former member of an Institutional Review Board. So that’s my background. I want to use my very brief time to mention a study that often comes up in historical reviews of psychology. It’s the Middlemist “bathroom study” (http://www2.uncp.edu/home/marson/Powerpoints/3610Bathroom1.pdf). It’s sometimes called the micturition study, if you have a preference for scientific terminology. To be clear: I’m not trying to say that the Facebook experiment is like the bathroom experiment. But there are some interesting parallels. So I’ll just give you a quick rundown of those parallels. This is a research study conducted by psychologists in a men’s restroom at a large Midwestern university. Basically the researchers built a small periscope-like device that allowed a professor sitting in a toilet stall to observe patrons at the urinals from a side angle. The reason that the researchers did this is that they had a hypothesis about physiologic excitation and personal space. So they designed an experimental design where a confederate, a student on the research team, would stand near or at a distance from an individual who came into the bathroom to use the urinal. They did this without consent and they didn’t have a debriefing process. They timed, with a stopwatch, the urination to help them draw conclusions about physiologic excitation and physical proximity of strangers.

The reason that the Middlemist “bathroom study” is a useful parallel to today’s uproar over the Facebook Emotions Study is that public criticism of the research did not focus on physical harms to human subjects but, rather, the perceived indignity and disregard for individual privacy that the study suggested. The researchers defended themselves and used reasonably sound logic, arguing that going to the bathroom is an everyday experience. They studied a public bathroom after all. The worst that could happen is that a subject feels a little weird that someone’s watching them in a public bathroom. And, in fact, they argued debriefing would have produced the harm in this study. If they’d told men that they’ve been watched in a public bathroom, it may then make them uncomfortable. So in fact, telling subjects about the study produces the only harm that could happen. So, they reasoned, we shouldn’t debrief subjects about the study. The debate about this study is extensive. But one of the conclusions that followed from it is that researchers in this case focused on the wrong harms. They argued that individuals in this study probably couldn’t be harmed because it’s only mildly embarrassing or creepy to be watched in a public bathroom. But the harm that the researchers should have addressed or considered was the potential harm to the image of the profession or all of science. Some research subjects were actually very upset about the study and felt it violated human decency and their individual dignity. They were not harmed individually, but found this study creepy and invasive. Avoiding telling people that you’re doing this kind of research because telling them would upset them doesn’t help at all. Researchers simply delay the harm that will follow when the public eventually find out how the study was conducted. Such delays only leave the public more angry that researchers didn’t tell subjects, at some point in the study, because it suggests that the researchers are hiding something. So the question I have for the panel and the audience is: Is it possible for us to anticipate this kind of harm? Is it possible for us as researchers to design research and say this is something that’s going to cause controversy because people are going to think it’s very creepy, versus this is something that no one’s going to have a problem with. That’s actually a difficult question to answer.

Some people have argued, well, you know, Facebook’s already done a variety of other studies that changed users’ information without their knowledge, so why does this one produce the controversy? I would argue that there research cases and topics where there are foreseeable harms because we know that people feel differently about certain areas of their lives. People feel differently about whether there’s an intervention or not. People feel differently about the valence of the intervention. For example, people will feel differently about whether an intervention or research experiment is done for science or for a corporation. But, really, the only way that we’re going to be able to predict whether the “creepiness factor” will register as a problem is to involve research participants in the research design at some level. Participants’ involvement could help researchers figure out the level of threat before we execute our research. Fundamentally, researchers aren’t the ones who decide what is threatening or crossing the line for the public. If participants feel our research methods are creepy and they hate it, we don’t want to be in the business of doing that research. We’re not going to be able to argue participants out of their feelings and say “no, it’s all right; people look at you in the bathroom all the time. We’re not going to be able to do that. We need a different approach and understanding of “harm” to conduct social research.

JEFF BINGHAM:

I’m Jeff Bingham from Carnegie Mellon University. I approached this research area a little bit differently. I work on building systems to support people with disabilities, often using human computation. Mary asked us to think about what skin we have in this game. So the skin I have in this game is that social media are the primary way that we recruit the people that power the systems we build for people with disabilities, via friend sourcing, community sourcing, citizen science, traditional crowdsourcing. It’s also the resource we have to understand those people using our systems. As Mary said in her talk earlier this morning, “crowds are people,” and it is really important for us to make these systems work well–make them sustainable and make them scalable.

We’re increasingly moving away from, say, Amazon Mechanical Turk, to services like Facebook, to power our systems for people with disabilities. Ultimately, we need users to trust the platforms on which we are recruiting workers. So if they don’t trust Facebook, for instance, they may not use it or they may move to closed systems that don’t allow us the kind of access or the ability to incorporate human work into our systems. I’ve tried bootstrapping sociotechnical systems on my own, and it’s actually really hard without piggybacking on existing platforms. So it’s really important that we have continued access to the general public using commercial platforms. I think that we can all agree this is about a lot more than one study or one research article. And so my fear is that, as a result of this experience, we will be more likely to miss out on the upsides and rewards that could come from engaging with users of these services in interesting ways. My hope is that we can find a way to preserve the utility of these sites and our ability to do important research and innovate on social media platforms. I also hope that researchers can continue partnering with industry while addressing the very real concerns of users. So my question is what practical steps should researchers take right now, while public opinion and corporate policies are still being sorted out, to help ensure our long-term ability to work with companies who are running these very interesting platforms?

AMY BRUCKMAN:

Thank you, Mary, so much for organizing this. It’s really timely. I launched an online, programmable virtual world for children in 1995. I got interested in Internet research ethics because I asked people what is the ethical way to do this and nobody knew. So I had to think ethically and invent the ethical things to do. In the 1990s, I was part of three different working groups focused on developing ethical policies for Internet research: One for the Association of Internet Researchers; another for AAS; and a third one for the APA. The APA group, led by Bob Kraut, resulted in a paper which you may find useful and is available on my website, along with a long list of other papers on research ethics. I think it may be time for us to have another round of working groups. It’s been a long time since the ’90s. There are some new issues emerging and we could use some updated statements of what the ethical issues are here and how to handle them. Several of my papers on research ethics have dealt with the issue of disguising subjects’ online identities.

I argue that, in many cases, contrary to the traditional approach of always disguising research subjects, if they are doing creative work on the Internet, for which they deserve credit, we are ethically obligated to ask them: “Do you want me to use your real name?” It would be unethical to hide their names without their consent. I want to be a little bit deliberately provocative here: I have done research on Internet users without their consent, and I would do it again. According to U.S. law, you can do work without consent. You can get a full waiver of consent if the research can’t be practicably done without a waiver, if the benefits outweigh the risk and if the risk is low. I have a post on my blog at nextbison.wordpress.com about a study that I did in 2003 where we walked into IRC chat rooms and recorded chat room participants’ reactions. Actually, we were really studying whether we would get kicked out of the chat room. We had four conditions: A control, where we walked in and didn’t say anything; a treatment where we walked in and said “Hi. I’m recording this for a study of language online;” an opt-in treatment; and an opt-out treatment. I know this gets very meta. And a little circular. But we found that people really didn’t want us to be in their IRC chat rooms. Almost no one opted in. And no one opted out. We have a colorful collection of the boot messages we received as we were kicked out of these chat rooms. My favorite is “Yo mama’s so ugly she turned Medusa to stone.” So ironically, despite the fact that our research documents that we made people angry, I still think the study itself was ethical. It’s certainly not something that we did lightly. But the level of disturbance we created was relatively small. I think what we learned from it was beneficial to people and to science in general. The original papers are available on my blog. And if you’re interested in more details, I’d be happy to discuss it with you. But my point in referencing this study is to argue that it is possible to do research that upsets people and we should be careful about overreactions to our work.

I want to say that the reaction to the Facebook study was out of proportion. And I hope that Jeff knows that we, his colleagues, are behind him. The reaction to Facebook, the company, also was excessive. I love a lot of the research that Facebook does. I’m not saying it’s perfect. There’s a lot that all of us have to learn about researching social media. And I will say there’s a lot we can learn from this incident. I’m glad it started these series of conversations. A couple of questions that I have for the future are: Should companies be required to have something more like a real IRB? That’s a tough one. It has a lot of complications. Distinguishing social science research from how companies do their business and make their sites usable is almost impossible. My other burning question, that I hope we can discuss, is should conferences and journals that do peer review also review the ethics of a study?

A while ago I reviewed an Internet-based study submitted to the conference, CHI. I objected to the ethics of the study, and objected violently. I was really offended by this study. I put my objections in my CHI review and I gave the paper a 1. I never give 1s; I’m nice. I got back a response from the program committee that year that the researchers had their study approved by their campus Institutional Review Board (IRB) and they proceeded in good faith; so, we declare this study to be ethical. Therefore, it’s not the reviewer’s place to question the ethics of the study. I’m not sure that’s how we should be handling things. I think we need to think about our ethics review as an incredibly complicated socio technical system, with tools and rules and divisions of labor and different activity systems run by different IRBs that come to different solutions. Somehow, there has to be some error correction when we come together to share our work. On the other hand, the practical question of how we do this without causing tremendous practical problems and unfairness in the meta review is difficult, too. So I don’t think it’s easy. But I don’t think the hand waving, “oh, it was approved, it’s not our business,” is the right answer, either. So I’m looking forward to more conversations from here. Thanks.

JEFF HANCOCK:

Thank you everybody for coming in today. Thank you Mary for organizing this. And for the fellow panelists for being part of this on pretty short notice. And thank you all for this morning. I’ve seen many colleagues and friends. It’s been great to feel supported and people reaching out to make sure I’m doing okay. It was my first experience with global worldwide Internet heat wrath, and it was very difficult. I will admit. My family paid a price for it. I paid a price, but I feel much better being amongst colleagues. Mostly because this is a really important conversation, and I feel now a privilege and a responsibility to be a part of that. I thought I would take a different approach from the rest of the panelists and describe a little bit of what I learned from the various e-mails I received from around the world in response to this. And I’ll keep it a little bit higher level, away from specific identities. Some of them are pretty intense. And I think that the intensity actually points to something important.

I received a couple hundred e-mails from people from around the world. The e-mails that I want to discuss with you are ones from the people using Facebook. This was their role as a stakeholder. These e-mails are distinct from those that I received from other academics with questions about ethical issues, around informed consent, around how IRB dealt with this, et cetera.

Facebook users’ emails tended to fall into three main categories. The first one was: How dare you manipulate my news feed! And this was a really fervent response—and very common. I think it points to something that Christian Sandvig and other scholars, thinking about algorithms and the social world have been taking up in their work. As Tarleton Gillespie puts it, we don’t have metaphors in place for what the news feed is. We have a metaphor for the postal service: messages are delivered without tampering from one person to the next. We have a metaphor from the newsroom: editors choose things that we think will be of interest. But there’s no stable metaphor that people hold for what the news feed is. I think this is a really important thing. I’m not sure whether this means we need to bring in an education component to help people understand that their news feeds are altered all the time by Facebook? but the huge number of e-mails about people’s frustration that researchers would change the news feed indicates that there’s just no sense that the news feed was anything other than an objective window into their social world.

The second category of e-mail that I received signals that the news feed is really important to people. I got a number of e-mails saying things like: “You know my good friend’s father just died. And if I didn’t have the news feed I may not have known about it.” This surfaced a theme that the news isn’t just about what people are having for breakfast or all the typical mass media put-downs of Twitter and Facebook. Rather, this thing that emerged about seven years ago [Facebook] is now really important to people’s lives. It’s central and integrated in their lives. And that was really important for me to understand. That was one of the things that caught me off guard, even though maybe in hindsight it shouldn’t have.

The last category of e-mail that I received: A lot of people asked me why I thought this study attracted this kind of attention and controversy, whereas other similar studies did not. I thought a lot about that. One of the things that came out of the e-mails is that, as Christian Sandvig argued earlier, we were looking at the wrong place for what would register as “harm.” People have a very strong sense of autonomy. We know that quite well from social psychology and from sociology. I think our study violated people’s sense of autonomy and the fact that they do not want their emotions manipulated or mood controlled. And I think it’s a separate issue whether we think emotions are being manipulated all the time, through advertising, et cetera. What became very clear in the e-mail was that emotions are special. And I think it’s one example of a class of things that will fall into some of the spaces that Christian Sandvig talked about. If we work on one of these special classes or categories of human experience, like emotion, without informed consent, without debriefing, we could do larger harm than just harm to participants.

I can now have some sense of humor around some of the hate mail. And it’s been an amazing learning experience for me. I hope that by turning it over to the floor here and having ongoing conversations, we can really move things forward. My burning question would be: I think that this is a huge turning point or advance for social sciences potentially in the same way that, say, evolutionary theory was important for biology or the microscope was for chemistry. And I would want us to think about how we would continue doing the research on social media platforms ethically. So in the same way that Stanley Milgram’s study caused us to rethink what ethical research practices are, in the same way that Amy Bruckman’s calling on us to return to reflecting on how we do Internet research, now that we can do social psychology essentially at scale, how do we bring ethics along with that?

MARY L. GRAY:

I think what we can do concretely, with the time we have left — we have a little bit of time remaining. But I think the most productive thing we could do, I would argue, is get a lot of questions on the table. Because we are recording this, I can get a transcript and we can collect all the questions. And I would honestly say I don’t really listen to anybody who tells me right now they have the answer, because we’ve only been studying this thing for about ten years. This is entirely new to us. I don’t know that our objective should be answering anything today. I think we should be listening to each other, hearing our concerns and hearing some really important questions. So with that in mind, let’s hear some questions.

Questions and comments generated by the audience:

Where do you think this [conversation about what to do next] should happen? I don’t think it’s just a matter of us having a special issue of a journal and people publish their opinions, and I know that stuff like that is happening. But it feels like we have to have some real dialogue. Who are the people who you think need to be involved in these conversations and where do you think some of these conversations can happen?

I think the value of this experiment and the reaction to it is that it has raised the awareness of the algorithmic power that these organizations [social media companies] have. What is the responsibility of the Facebooks and the Googles of the world to be aware of this?

Do we all agree that corporations have a role in this conversation?

Information is being presented and it’s being manipulated [through social media interfaces] by definition. If you’re working in a mass medium with a corporation, you’re changing the presentation of information all the time. How do we draw any lines about this to distinguish what is ethical or unethical presentation of this kind of information?

How can we take this up to be a national and an international conversation. I think we need to be thinking [beyond] the campus level. The variability among IRBs is hopeless because if one campus IRB has approved something that doesn’t mean that meets some national/international level of standard. How can we think about this internationally, since these are international corporations and international data we’re talking about. These aren’t just Cornell, Berkeley or UCLA data.

For the most part Facebook is occupied all time by highly vulnerable populations. Even if there was an open consent process there, how do you know the populations there really would have been in a position to fully give informed consent?

Could there be something that companies with social media sites actually do to let end users know this is or specify how they want their information to be reused or it’s like the organic food sticker on foods? Could we create some way to very simple allow people to say to us, “sure, go ahead, modify my stuff, or don’t touch my stuff” or something like that? Maybe there’s some trigger especially for anything that’s private.

How, as industrial researchers, do we maintain ethical obligations to our subjects similar to those of academic researchers?

As a community how do we agree, when we acknowledge they’re going to be many, many different partners, some industry, some academia, doing lots of kinds of research who’s responsible for the ethical treatment of human subjects and their data?
I think if you have a Ph.D., perhaps part of that professional training should mean that we can assume that you can behave ethically until it’s proven otherwise.
What is the argument towards industry [for tighter ethical regulation] that’s going to make sense? And number one is losing your customer base. I’m sure Facebook has taken a hit and every single advertiser has taken a hit because you’re going to think twice about clicking on the button. How do we speak to corporate organizations and convince them that they should change their actions?
So I’m somewhat still puzzled by what you [Jeff Hancock] think about your findings. Do you really feel like you imposed some sort of negative valence on people that hurt them, or is there a lot of uncertainty here? And how is this different than the day-to-day interactions we have? Why is this special?