The Pluses and Pitfalls of Online Research
Setting realistic expectations • Best practices • Lax scrutiny
APS Fellow Isabel Gauthier has been conducting research with online participants for 15 years, and for much of that time found the quality of data to be as good as what was collected in lab settings.
But when the Vanderbilt University cognitive neuroscientist and her colleagues recently tried to recruit and test twins online for a study, their screening uncovered that most of the pairs were not actually twins.
“Either they lied when they joined the platform, or they lied to us,” said Gauthier, who attributes the problems to seeking self-reports from a special population. Based on that experience, Gauthier moved to a different platform and heightened the quality control on the data collection. But she is among the many psychological scientists who have discovered that online research, embraced for its diverse participant pools and ease of use, requires some intense diligence.
Over the last 2 decades, online platforms have become a widely used tool for psychological scientists and other researchers. Services like Amazon Mechanical Turk (MTurk), Prolific, CloudResearch, and Qualtrics allow researchers access to a more diverse pool of participants than the convenience sample of undergraduate students typically available on college campuses.
But as the use of these platforms has climbed, so have the caveats. Many scientists complain of bots (automated software) infiltrating the responses to their surveys and questionnaires. Others find that participants aren’t always as diverse or representative as expected. And the study participants can be inattentive and, as Gauthier discovered, dishonest.
But some experts in online research say many scientists expect too much out of the platforms and don’t employ adequate steps to ensure the integrity of the data they collect. They say scientists should understand when—and when not—to answer a research question by sampling online. Scientists must remember to rigorously design tasks and questions to bolster their chances of collecting usable data.
“High-quality data is contingent on the decisions we make as researchers,” said Melissa Keith, a Bowling Green State University psychologist who studies online research samples. “Online data collection requires caution. At the same time, low-quality data is not inevitable if researchers approach data collection thoughtfully and follow best practices.”
Setting realistic expectations
Online studies often incorporate self-report scales and questionnaires, and hinge on participants providing honest, deliberative answers. But several scientists have published articles designed to illuminate the limitations of online research. Below are some examples.
- Joel Nadler (Southern Illinois University, Edwardsville) and colleagues found that 37% of the data collected during 46 online studies was unusable because participants failed attention checks (Nadler et al., 2021).
- Yaakov Ophir (Ariel University and University of Cambridge) and colleagues found that MTurk workers reported experiencing major depression two to three times more than participants in face-to-face interviews, which raises questions about the generalizability of data from online samples (Ophir et al., 2020).
- A team of researchers led by Rutgers University behavioral scientist Marybec Griffin reported that data they collected online required extensive cleaning due to bots, which infiltrated the data despite the platform’s safeguards (Griffin et al., 2022).
Data quality can vary across the platforms, Gauthier noted. (For a look at the varying price points, functionalities, and data quality, see Douglas et al, 2023).
“It can also change rapidly on the same platform over time, and only some of the time do we know what drives those changes, such as a policy change,” she said.
In a recent article for Perspectives on Psychological Science, Christine Cuskley, co-director of the Centre for Behaviour and Evolution at Newcastle University, and cognitive scientist Justin Sulik of Ludwig-Maximilians-Universität München), warned that many researchers hold inflated conceptions of the time, work, and cost savings they can achieve by doing research with online samples.
“Effective use requires specific expertise and design considerations,” Cuskley and Sulik wrote. “Like all tools used in research—from advanced hardware to specialist software—the tool itself places constraints on what one should use it for. Ultimately, high-quality data is the responsibility of the researcher, not the crowdsourcing platform.”
Jennifer Rodd, a University College London cognitive psychologist who has written extensively about using remote samples, agreed that online research should never be viewed as quick and easy.
“You need to sit down and figure out ahead of time, ‘How could this go wrong? How could I end up with bad or misleading data? What do I need to do to make sure doesn’t happen?’” Rodd said in an interview.
Best practices
In their Perspectives article, Cuskley and Sulik outlined a “mental model” for how researchers can shoulder the burden of producing high-quality data.
Importantly, you should first consider whether an online study is the best way to answer a particular research question—especially if a study requires high concentration from participants, Cuskley said.
When an online study is a good fit, pilot testing and perspective taking can help researchers create studies that produce reliable results. Researchers can collect qualitative feedback from participants during pilot testing to help create studies that are engaging enough to compete with everything else going on around them, Cuskley explained.
Cuskley also cautioned researchers not to automatically blame bots for data that doesn’t meet expectations. The disappointing responses may stem from poor study design.
“Genuinely imagine yourself as the participant,” she said. “How would you fare in a repetitive 40-minute task in 12-point font, from your own living room? For most people, if they’re being honest, not very well. The kinds of online tasks that get researchers high-quality data are brief, accessible, and engaging.
“People interacting with our tasks don’t necessarily ‘owe’ us high-quality data,” she added, “so the burden is on us as researchers to think very intentionally about our online experimental design.”
Researchers also need to consider how they go about enlisting participants, Cuskley and Sulik said in the paper. Recruiting participants through direct appeals is like posting an ad in the classifieds section of a newspaper and hiring whoever happens to respond to the ad first, they wrote. Paying extra for a platform that filters participants for you, on the other hand, is like working with a recruiting agency, and can help connect researchers with the optimal participants.
Researchers should also offer participants appropriate compensation, not only because it may incentivize better performance, but because they have an ethical responsibility not to exploit workers, Cuskley and Sulik wrote. While $7.25 per hour may be the federal minimum wage in the United States, TurkerView guidelines suggest that this is considered low pay by many MTurkers, with $10-$12 per hour considered a good rate.
Gauthier advised researchers to take particular precautions when doing online research with distinctive groups, such as the twins her team has been recruiting.
“For more specialized populations, it may be necessary to find our own participants, even if you choose to bring them into a platform for all the convenience that this can afford,” she said.
Researchers recommend many other approaches to ensure data integrity, including:
- Investing time in learning the technology involved in online studies, or paying expert consultants to guide you
- Using qualitative questions to spot illogical responses
- Asking duplicative demographic questions to check for conflicting data
- Using security measures such as Google’s reCAPTCHA V3 to identify possible bots.
Bad data are going to get through no matter how well-prepared you may be, Rodd cautioned. Preregistering the exact conditions under which data will be excluded from a study can help maintain the credibility of your findings, she said.
Lax scrutiny
Alas, many researchers do not report doing anything to evaluate the quality of data generated by online surveys, according to a recent report in Advances in Methods and Practices in Psychological Science.
Jaroslav Gottfried, a postdoctoral researcher at University College Dublin, analyzed 3,298 articles published in 200 psychology journals during 2022 to see what methods researchers used to evaluate the quality of their online data. The most popular methods were attention checks, rates of missing answers, response time, outlier responses, and multiple submissions by the same participant, but these were not used as frequently as one might expect. While 24% of studies reported using one method, just 20% reported using two or more, and 55% did not evaluate their data at all.
The most common evaluation approaches were the use of bogus items, instructional manipulation checks, and other methods designed to check participants’ attention. Also common was completely excluding respondent data, even when a single survey item went unanswered. That latter approach not only introduces bias in the results but discards potentially valid data, Gottfried said.
He found that other data quality measures were used rarely. These include an analysis of response times, as well as steps to identify multiple submissions from a single respondent.
Gottfried offered recommendations to promote quality control in online research, including journals granting badges for studies that assess their data integrity and promoting tutorials on different methods to ensure validity (Gottfried, 2024).
Rodd cautioned against viewing online studies as presenting a whole new set of problems for researchers. It’s easy to fool ourselves into believing that lab testing offers a perfectly sanitized, rigorous environment for research, but many of the problems cropping up on online platforms are also present, if less extreme, in the lab, Rodd said.
“The issues are really similar and the solutions are really similar,” she explained. “It’s just that some problems are bigger or more evident in online testing.”
Rodd advises researchers to remember all that online testing has to offer to psychological science.
“This is an incredible resource,” she said. “You have the ability to collect data from people on the other side of the world that you’ve never met. It’s just astonishing and potentially revolutionary in terms of the types of questions that we get to be answering.”
Buhrmester, M. D., Talaifar, S., & Gosling, S. D. (2018). An evaluation of Amazon’s Mechanical Turk, its rapid rise, and its effective use. Perspectives on Psychological Science, 13(2), 149-154. https://doi.org/10.1177/1745691617706516
Cuskley, C., & Sulik, J. (2024). The Burden for High-Quality Online Data Collection Lies With Researchers, Not Recruitment Platforms. Perspectives on Psychological Science, 0(0). https://doi.org/10.1177/17456916241242734
Douglas, B.D., Ewell, P.J., Brauer, M. (2023) Data quality in online human-subjects research:
Comparisons between MTurk, Prolific,CloudResearch, Qualtrics, and SONA. PLoS ONE 18(3): e0279720. https://doi.org/10.1371/journal.pone.0279720
Gottfried J. Practices in Data-Quality Evaluation: A Large-Scale Review of Online Survey Studies Published in 2022. Advances in Methods and Practices in Psychological Science. 2024;7(2). doi:10.1177/25152459241236414
Griffin, M., Martino, R., LoSchiavo, C., Comer-Carruthers, C., Krause, K.D., Stults, C.B., Halkitis,P.N. (2022). Ensuring survey research data in the era of Internet bots. Quality & Quantity, 56, 2841–2852. https://doi.org/10.1007/s11135-021-01252-1
Keith, M. G., & McKay, A. S. (2024). Too anecdotal to be true? Mechanical turk is not all bots and bad data: Response to Webb and Tangney (2022). Perspectives on Psychological Science. https://doi.org/10.1177/17456916241234328
Nadler, J., Baumgartner, S., & Washington, M. (2021). MTurk for working samples: Evaluation of data quality 2014 – 2020. North American Journal of Psychology, 23(4), 741-751.
Ophir, Y., Sisso, I., Asterhan, C. S. C., Tikochinski, R., & Reichart, R. (2020). The turker blues: Hidden factors behind increased depression rates among Amazon’s Mechanical Turkers. Clinical Psychological Science, 8(1), 65-83. https://doi.org/10.1177/2167702619865973
Rodd, J. (2024). Moving experimental psychology online: How to obtain high quality data when we can’t see our participants. Journal of Memory and Language, 134. https://doi.org/10.1016/j.jml.2023.104472
APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.
Please login with your APS account to comment.