New Content From Advances in Methods and Practices in Psychological Science

Journal header for Advances in Methods and Practices in Psychological Science.

When Alternative Analyses of the Same Data Come to Different Conclusions: A Tutorial Using DeclareDesign With a Worked Real-World Example
Dorothy V. M. Bishop, Charles Hulme 

Recent studies in psychology have documented how analytic flexibility can result in different results from the same data set. Here, we demonstrate a package in the R programming language, DeclareDesign, that uses simulated data to diagnose the ways in which different analytic designs can give different outcomes. To illustrate features of the package, we contrast two analyses of a randomized controlled trial (RCT) of GraphoGame, an intervention to help children learn to read. The initial analysis found no evidence that the intervention was effective, but a subsequent reanalysis concluded that GraphoGame significantly improved children’s reading. With DeclareDesign, we can simulate data in which the truth is known and thus can identify which analysis is optimal for estimating the intervention effect using “diagnosands,” including bias, precision, and power. The simulations showed that the original analysis accurately estimated intervention effects, whereas selection of a subset of data in the reanalysis introduced substantial bias, overestimating the effect sizes. This problem was exacerbated by inclusion of multiple outcome measures in the reanalysis. Much has been written about the dangers of performing reanalyses of data from RCTs that violate the random assignment of participants to conditions; simulated data make this message clear and quantify the extent to which such practices introduce bias. The simulations confirm the original conclusion that the intervention has no benefit over “business as usual.” In this tutorial, we demonstrate several features of DeclareDesign, which can simulate observational and experimental research designs, allowing researchers to make principled decisions about which analysis to prefer. 

Assessing the Generality of a Self-Administered Strategic-Resource-Use Intervention on Academic Performance: A Multisite, Preregistered Conceptual Replication of Chen et al. (2017)
Peter P. J. L. Verkoeijen, Gabriela V. Koppenol-Gonzalez, Lidia R. Arends, et al. 

Chen et al. designed a novel strategic-resource-use (SRU) intervention that higher-education students could self-administer online. This intervention aimed to help students improve their performance by stimulating them to think about using learning resources for an exam preparation. The SRU intervention was tested in two undergraduate introductory-statistics courses. In the first experiment, students in the control condition received an email asking them to state their desired grade, how motivated they were to get that grade, how important it was to obtain the desired grade, and how confident they were in obtaining it. Participants in the experimental condition received the same mail and took the 15-min SRU intervention. On the final course exam, the SRU group outperformed the control group, yielding a small to medium effect size, a finding that was replicated in a second study. We conducted four preregistered conceptual replications of Chen and colleagues’ study in four undergraduate introductory-statistics courses at two Dutch higher-education institutions. In our study, the meta-analytic standardized effects on the final-exam scores in the intention-to-treat meta-analysis and the compliant-only analysis were small and not significantly different from 0, and the upper limits of the 95% confidence intervals of both meta-analyses were smaller than the effect sizes of the two studies reported by Chen and colleagues. Comparable results were obtained for the pass rates. Thus, the results of the present study failed to corroborate the previously demonstrated positive effect of the SRU intervention on final-exam scores and pass rates. 

Advancing Group-Based Disparities Research and Beyond: A Cautionary Note on Selection Bias
Dongning Ren, Wen Wei Loh 

Obtaining an accurate understanding of group-based disparities is an important pursuit. However, unsound study designs can lead to erroneous conclusions that impede this crucial work. In this article, we highlight a critical methodological challenge to drawing valid causal inferences in disparities research: selection bias. We describe two commonly adopted study designs in the literature on group-based disparities. The first is outcome-dependent selection, when the outcome determines whether an observation is selected. The second is outcome-associated selection, when the outcome is associated with whether an observation is selected. We explain the methodological challenge each study design presents and why it can lead to selection biases when evaluating the actual disparity of interest. We urge researchers to recognize the complications that beset these study designs and to avoid the insidious impact of inappropriate selection. We offer practical suggestions on how researchers can improve the rigor and demonstrate the defensibility of their conclusions when investigating group-based disparities. Finally, we highlight the broad implications of selection mechanisms for psychological science. 

Visualization of Composite Plots in R Using a Programmatic Approach and smplot2
Seung Hyun Min 

In psychology and human neuroscience, the practice of creating multiple subplots and combining them into a composite plot has become common because the nature of research has become more multifaceted and sophisticated. In the last decade, the number of methods and tools for data visualization has surged. For example, R, a programming language, has become widely used in part because of ggplot2, a free, open-source, and intuitive plotting library. However, despite its strength and ubiquity, it has some built-in restrictions that are most noticeable when one creates a composite plot, which currently involves a complex and repetitive process with steps that go against the principles of open science out of necessity. To address this issue, I introduce smplot2, an open-source R package that integrates ggplot2’s declarative syntax and a programmatic approach to plotting. The package aims to enable users to create customizable composite plots by linearizing the process of complex visualization. The documentation and code examples of the smplot2 package are available online ( https://smin95.github.io/dataviz ). 

So You Want to Do ESM? 10 Essential Topics for Implementing the Experience-Sampling Method
Jessica Fritz, Marilyn L. Piccirillo, Zachary D. Cohen, et al. 

The experience-sampling method (ESM) captures psychological experiences over time and in everyday contexts, thereby offering exciting potential for collecting more temporally fine-grained and ecologically valid data for psychological research. Given that rapid methodological developments make it increasingly difficult for novice ESM researchers to be well informed about standards of ESM research and to identify resources that can serve as useful starting points, we here provide a primer on 10 essential design and implementation considerations for ESM studies. Specifically, we (a) compare ESM with cross-sectional, panel, and cohort approaches and discuss considerations regarding (b) item content and phrasing; (c) choosing and formulating response options; (d) timescale (sampling scheme, sampling frequency, survey length, and study duration); (e) change properties and stationarity; (f) power and effect sizes; (g) missingness, attrition, and compliance; (h) data assessment and administration; (i) reliability; and (j) replicability and generalizability. For all 10 topics, we discuss challenges and—if available—potential solutions and provide literature that can serve as starting points for more in-depth readings. We also share access to a living, web-based resources library with a more extensive catalogue of literature to facilitate further learning about the design and implementation of ESM. Finally, we list topics that although beyond the scope of our article, can be relevant for the success of ESM studies. Taken together, our article highlights the most essential design and implementation considerations for ESM studies, aids the identification of relevant in-depth readings, and can thereby support the quality of future ESM studies. 

A Cautionary Note on Using Univariate Methods for Meta-Analytic Structural Equation Modeling
Suzanne Jak, Mike W.-L. Cheung 

Meta-analytic structural equation modeling (MASEM) is an increasingly popular technique in psychology, especially in management and organizational psychology. MASEM refers to fitting structural equation models (SEMs), such as path models or factor models, to meta-analytic data. The meta-analytic data, obtained from multiple primary studies, generally consist of correlations across the variables in the path or factor model. In this study, we contrast the method that is most often applied in management and organizational psychology (the univariate-r method) to several multivariate methods. “Univariate-r” refers to performing multiple univariate meta-analyses to obtain a synthesized correlation matrix as input in an SEM program. In multivariate MASEM, a multivariate meta-analysis is used to synthesize correlation matrices across studies (e.g., generalized least squares, two-stage SEM, one-stage MASEM). We conducted a systematic search on applications of MASEM in the field of management and organizational psychology and showed that reanalysis of the four available data sets using multivariate MASEM can lead to different conclusions than applying univariate-r. In two simulation studies, we show that the univariate-r method leads to biased standard errors of path coefficients and incorrect fit statistics, whereas the multivariate methods generally perform adequately. In the article, we also discuss some issues that possibly hinder researchers from applying multivariate methods in MASEM. 

A Response to a Comment on Hall et al. (2024)
Kathleen Schmidt, Gerald J. Haeffel, Neil Levy, et al. 

Robust Evidence for Knowledge Attribution and Luck: A Comment on Hall et al. (2024)
Wesley Buckwalter, Ori Friedman 

Registered Replication Report: A Large Multilab Cross-Cultural Conceptual Replication of Turri et al. (2015)
Braeden Hall, Kathleen Schmidt, Jordan Wagge, et al. 

According to the justified true belief (JTB) account of knowledge, people can truly know something only if they have a belief that is both justified and true (i.e., knowledge is JTB). This account was challenged by Gettier, who argued that JTB does not explain knowledge attributions in certain situations, later called “Gettier-type cases,” wherein protagonists are justified in believing something to be true, but their belief was correct only because of luck. Laypeople may not attribute knowledge to protagonists with justified but only luckily true beliefs. Although some research has found evidence for these so-called Gettier intuitions, Turri et al. found no evidence that participants attributed knowledge in a counterfeit-object Gettier-type case differently than in a matched case of JTB. In a large-scale, cross-cultural conceptual replication of Turri and colleagues’ Experiment 1 (N = 4,724) using a within-participants design and three vignettes across 19 geopolitical regions, we did find evidence for Gettier intuitions; participants were 1.86 times more likely to attribute knowledge to protagonists in standard cases of JTB than to protagonists in Gettier-type cases. These results suggest that Gettier intuitions may be detectable across different scenarios and cultural contexts. However, the size of the Gettier intuition effect did vary by vignette, and the Turri et al. vignette produced the smallest effect, which was similar in size to that observed in the original study. Differences across vignettes suggest that epistemic intuitions may also depend on contextual factors unrelated to the criteria of knowledge, such as the characteristics of the protagonist being evaluated. 

Preprocessing Experience-Sampling-Method Data: A Step-by-Step Framework, Tutorial Website, R Package, and Reporting Templates
Jordan Revol, Chiara Carlier, Ginette Lafit, Martine Verhees, Laura Sels, Eva Ceulemans 

Experience-sampling-method (ESM) studies have become a very popular tool to gain insight into the dynamics of psychological processes. Although the statistical modeling of ESM data has been widely studied, the preprocessing steps that precede such modeling have received relatively limited attention despite being a challenging phase. At the same time, adequate preprocessing of ESM data is crucial: It provides valuable information about the quality of the data and, importantly, helps to resolve issues in the data that may compromise the validity of statistical analyses. To support researchers in properly preprocessing ESM data, we have developed a step-by-step framework, a tutorial website that provides a gallery of R code, an R package, and templates to report the preprocessing steps. Particular attention is given to three different aspects in preprocessing: checking adherence to the study design (e.g., whether the momentary questionnaires were delivered according to the sampling scheme), examining participants’ response behaviors (e.g., compliance, careless responding), and describing and visualizing the data (e.g., examining distributions of variables). 

But Did They Really Perceive No (Low) Choice? Comment on Vaidis et al. (2024)
David A. Lishner 

Dissonance in the Induced-Compliance Paradigm: A Commentary on Vaidis et al. (2024)
Eddie Harmon-Jones, Cindy Harmon-Jones 

Noise Versus Signal: What Can One Conclude When a Classic Finding Fails to Replicate?
Wilson Cyrus-Lai, Warren Tierney, Eric Luis Uhlmann 

From the Illusion of Choice to Actual Control: Reconsidering the Induced-Compliance Paradigm of Cognitive Dissonance
Shiva Pauer, Roman Linne, Hans-Peter Erb 

The induced-compliance paradigm is a fundamental pillar in the literature on cognitive dissonance. A recent failed replication by Vaidis et al. casts doubt on the widely used experimental method, thereby challenging the literature and prevailing theorizing about the role of perceived choice in cognitive dissonance. However, the nonreplication of the experimental effects could be attributable to methodological factors, such as laboratory settings and cross-temporal dynamics. We therefore reanalyzed the replication data to further explore the relationship between dissonant-attitude change and choice perceptions, employing self-report items instead of the traditional experimental manipulation of choice. Our analysis revealed a significant interaction effect between perceived choice and dissonant behavior (writing a counterattitudinal essay vs. a self-chosen essay) on attitude change: Participants who wrote a counterattitudinal essay aligned their attitudes only if they reported high (vs. low) freedom of choice. These findings suggest a crucial role of choice perceptions in dissonance reduction, consistent with the original theorizing. Future research can employ various methods and draw from adjacent fields, especially from the literature on control perceptions, to reconsider the induced-compliance paradigm and advance research on cognitive dissonance. 

Validity and Transparency in Quantifying Open-Ended Data
Clare Conry-Murray, Tal Waltzer, Fiona C. DeBernardi, et al. 

Quantitatively coding open-ended data (e.g., from videos, interviews) can be a rich source of information in psychological research, but reporting practices vary substantially. We provide strategies for improving validity and reliability of coding open-ended data and investigate questionable research practices in this area. First, we systematically examined articles in four top psychology journals (N = 956) and found that 21% included open-ended data coded by humans. However, only about one-third of those articles reported sufficient details to replicate or evaluate the validity of the coding process. Next, we propose multiphase guidelines for transparently reporting on the quantitative coding of open-ended data, informed by concerns with replicability, content validity, and statistical validity. The first phase involves research design, including selecting data and identifying units reliably. The second phase includes developing a coding manual and training coders. The final phase outlines how to establish reliability. As part of this phase, we used data simulations to examine a common statistic for testing reliability on open-ended data, Cohen’s κ, and found that it can become inflated when researchers repeatedly test interrater reliability or manipulate categories, such as by including a missing-data category. Finally, to facilitate transparent and valid coding of open-ended data, we provide a preregistration template that reflects these guidelines. All of the guidelines and resources provided in this article can be adapted for different types of studies, depending on context. 

When Replication Fails: What to Conclude and Not to Conclude?
Willem W. A. Sleegers, Florian van Leeuwen, Robert M. Ross, et al. 

In this commentary, we examine the implications of the failed replication reported by Vaidis et al., which represents the largest multilab attempt to replicate the induced-compliance paradigm in cognitive-dissonance theory. We respond to commentaries on this study and discuss potential explanations for the null findings, including issues with the perceived choice manipulation and various post hoc explanations. Our commentary includes an assessment of the broader landscape of cognitive-dissonance research, revealing pervasive methodological limitations, such as underpowered studies and a lack of open-science practices. We conclude that our replication study and our examination of the literature raise substantial concerns about the reliability of the induced-compliance paradigm and highlight the need for more rigorous research practices in the field of cognitive dissonance. 

The Comedy of Measurement Errors: Standard Error of Measurement and Standard Error of Estimation
David J. Stanley, Jeffrey R. Spence 

Testing is used to inform a range of critical decisions that help structure much of contemporary society. An unavoidable aspect of testing is that test scores are not infallible. As a result, individual test scores should be accompanied by an interval that indicates the uncertainty surrounding the score. There are a number of different test-score intervals that can be created from different error terms. Unfortunately, there are pervasive misinterpretations of these errors and their intervals. Many of these interpretations can be found in authoritative sources on psychological measurement, which has resulted in stubborn and persistent confusion about what these intervals mean. In the current article, we clarify two important error terms and their intervals: (a) the Standard Error of Estimation and (b) the Standard Error of Measurement. We explicate the meaning and interpretation of these errors by examining their statistical foundations. Specifically, we detail how these terms are formulated from different statistical models and the implications of these models for their different interpretations. We use classical test theory, bivariate linear regression, R activities, and algebra to illustrate the key concepts and differences. 

How Statistical Challenges and Misreadings of the Literature Combine to Produce Unreplicable Science: An Example From Psychology
Andrew Gelman, Nicholas J. L. Brown 

Given the well-known problems of replicability, how is it that researchers at respected institutions continue to publish and publicize studies that are fatally flawed in the sense of not providing evidence to support their strong claims? We argue that two general problems are (a) difficulties of analyzing data with multilevel structure and (b) misinterpretation of the literature. We demonstrate with the example of a recently published claim that altering patients’ subjective perception of time can have a notable effect on physical healing. We discuss ways of avoiding or at least reducing such problems, including comparing final results with simpler analyses, moving away from shot-in-the-dark phenomenological studies, and more carefully examining previous published claims. Making incorrect choices in multilevel modeling is just one way that things can go wrong, but this example also provides a window into more general problems with complicated designs, cutting-edge statistical methods, and the connections between substantive theory, experimental design, data collection, and replication. 

A Guide to Prototype Analyses in Cross-Cultural Research: Purpose, Advantages, and Risks
Yuning Sun, Elaine L. Kinsella, Eric R. Igou 

The prototype approach provides a theoretically supported basis for novel research, detailing “typical” cognitive representations of targets in question (e.g., groups, experiences). Fairly recently, in social and cognitive psychology, this approach has emerged to understand how people categorize and conceptualize everyday phenomena. Although this approach has previously been used to study everyday concepts, it has predominantly been overlooked in cross-cultural research. Prototype analyses are flexible enough to allow for the identification of both universal and culture-specific elements, offering a more comprehensive and nuanced understanding of the concept in question. We highlight theoretical, empirical, and practical reasons why prototype analyses offer an important tool in cross-cultural and interdisciplinary research while also addressing the potential for reducing construct bias in research that spans multiple cultural contexts. The advantages and risks of conducting prototype analyses are discussed in detail along with novel ways of integrating computational approaches with traditional prototype-analyses methods to assist in their implementation. 

A Methodological Framework for Stimuli Control: Insights From Numerical Cognition
Yoel Shilat, Avishai Henik, Hanit Galili, Shir Wasserman, Alon Salzmann, Moti Salti 

The stimuli presented in cognitive experiments have a crucial role in the ability to isolate the underlying mechanism from other interweaved mechanisms. New ideas aimed at unveiling cognitive mechanisms are often realized through introducing new stimuli. This, in turn, raises challenges in reconciling results to literature. We demonstrate this challenge in the field of numerical cognition. Stimuli used in this field are designed to present quantity in a non symbolic manner. Physical properties, such as surface area and density, inherently correlate with quantity, masking the mechanism underlying numerical perception. Different generation methods (GMs) are used to control these physical properties. However, the way a GM controls physical properties affects numerical judgments in different ways, compromising comparability and the pursuit of cumulative science. Here, using a novel data-driven approach, we provide a methodological review of non symbolic stimuli GMs developed since 2000. Our results reveal that the field thrives and that a wide variety of GMs are tackling new methodological and theoretical ideas. However, the field lacks a common language and means to integrate new ideas into the literature. These shortcomings impair the interpretability, comparison, replication, and reanalysis of previous studies that have considered new ideas. We present guidelines for GMs relevant also to other fields and tasks involving perceptual decisions, including (a) defining controls explicitly and consistently, (b) justifying controls and discussing their implications, (c) considering stimuli statistical features, and (d) providing complete stimuli set, matching responses, and generation code. We hope these guidelines will promote the integration of findings and increase findings’ explanatory power. 

Feedback on this article? Email apsobserver@psychologicalscience.org or login to comment.


APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.