Study: Sycophantic AI can undermine human judgment

Subjects who interacted with AI tools were more likely to think they were right, less likely to resolve conflicts.

We all need a little validation now and then from friends or family, but sometimes too much validation can backfire—and the same is true of AI chatbots. There have been several recent cases of overly sycophantic AI tools leading to negative outcomes, including users harming themselves and/or others. But the harm might not be limited to these extreme cases, according to a new paper published in the journal Science. As more people rely on AI tools for everyday advice and guidance, their tendency to overly flatter and agree with users can have harmful effects on those users’ judgment, particularly in the social sphere.

The study showed that such tools can reinforce maladaptive beliefs, discourage users from accepting responsibility for a situation, or discourage them from repairing damaged relationships. That said, the authors were quick to emphasize during a media briefing that their findings were not intended to feed into “doomsday sentiments” about such AI models. Rather, the objective is to further our understanding of how such AI models work and their impact on human users, in hopes of making them better while the models are still in the early-ish development stages.

Co-author Myra Cheng, a graduate student at Stanford University, said she and her co-authors were inspired to study this issue after they began noticing a pronounced increase in the number of people around them who had started relying on AI chatbots for relationship advice—and often ended up receiving bad advice because the AI would take their side no matter what. Their interest was bolstered by recent surveys showing nearly half of Americans under 30 have asked an AI tool for personal advice. “Given how common this is becoming, we wanted to understand how an overly affirming AI advice might impact people’s real-world relationships,” said Cheng.

Granted, there has been some prior research looking at AI sycophancy, but these focused on very limited settings, such as how often an AI tool will agree with you even if means contradicting a well-established fact. Cheng and her co-authors wanted to look more closely at the broader social implications.

For the first experiment, Cheng et al. tested 11 state-of-the-art AI-based LLMs—including those developed by OpenAI, Anthropic, and Google—and fed them community content from Reddit’s Am I The Asshole (AITA) subreddit. The questions covered such topics as relationship or roommate tensions, parent-child conflicts, and social situations and expectations. The authors compared the Reddit human consensus with the AI models and found that the AI tools were 49 percent more likely to affirm a given user’s actions, even when the specific scenarios clearly involved deception, harm, or illegal behavior.

For instance, someone asked the AIs whether they were wrong to lie to their romantic partner for two years by pretending to be unemployed. The Reddit/AITA consensus clearly landed on YTA (you’re the asshole), but the AIs typically responded with flowery answers rationalizing why such behavior was acceptable. Ditto for a question about whether it was okay not to pick up one’s litter in a public park because there weren’t any trash bins provided.

The team followed up with three experiments involving 2,405 participants to explore the behavioral consequences of the AIs’ sycophancy. Participants interacted with the tools in vignette settings designed by the researchers and also engaged in live chats with the AI models, discussing real conflicts from their own lives. The authors found that engaging with the chatbots resulted in users becoming more convinced of their own stance or behavior and less likely to try to resolve an interpersonal conflict or take personal responsibility for their own behavior.

In one live chat exchange, a man (let’s call him Ryan) talked to his ex without telling his girlfriend, who became upset about the concealment. The subject was initially open to acknowledging he might not have given fair weight to the validity of his girlfriend’s emotions. But AI kept affirming his choice and his intentions, so much so that by the end, Ryan was considering ending the relationship over the conflict, rather than trying to consider his girlfriend’s emotions and needs.

“It’s not about whether Ryan was actually right or wrong,” said co-author Cinoo Lee, a Stanford social psychologist. “That’s not really ours to say. It’s more about the pattern that’s consistent across the data. Compared to an AI that didn’t overly affirm, people who interacted with this over-affirming AI came away more convinced that they were right and less willing to repair the relationship, whether that meant apologizing, taking steps to improve things or changing their own behavior.”

A self-reinforcing pattern

All these effects held across demographics, personality types, and individual attitudes toward AI. Everyone is susceptible (yes, even you). Even when the team altered the AI to be less warm and friendly and adopt a more neutral tone, it made no difference in the results. “This suggests that sycophancy can have a self-reinforcing effect,” said co-author Pranav Khadpe, a graduate student at Carnegie Mellon University who studies human/computer interactions. In fact, it’s built into the engagement-driven metrics. Any time a user gives positive feedback on a ChatGPT message, for instance, that feedback is used to train the model to replicate that “good behavior.” User preferences are aggregated into preference datasets, which are then used to further optimize the model.

“If sycophantic messages are preferred by users, this has likely already shifted model behavior towards appeasement and less critical advice,” said Khadpe, which translates into less social friction—not necessarily a good thing, because “some things are hard because they’re supposed to be hard.” In fact, Anat Perry—a psychologist at Harvard and the Hebrew University of Jerusalem, who was not involved with the study—argues in an accompanying perspective that social friction is both desirable and crucial for our social development.

“Human well-being depends on the ability to navigate the social world, a skill acquired primarily through interactions with others,” Perry wrote. “Such social learning depends on reliable feedback: recognizing when we are mistaken, when harm has been caused, and when others’ perspectives warrant consideration…. Social life is rarely frictionless because people are not perfectly attuned to one another. Yet it is precisely through such social friction that relationships deepen and moral understanding develops.”

Another concerning finding is that study participants consistently described the AI models as objective, neutral, fair, and honest—a common misconception. “This means that uncritical advice under the guise of neutrality can be even more harmful than if people had not sought advice at all,” said Khadpe.

This study did not look at possible effective interventions, per the authors, keeping the focus on the default behavior of these AI models. Changing system prompts might help, such as asking the AI to take the other person’s perspective, and/or optimizing the models at later stages to prioritize more critical behaviors. But this is such a new field that most proposed interventions still need further study. According to Cheng, preliminary results from follow-up work indicate that changing the training data sets to be less affirming, or just telling the model to begin every response with “Wait a minute,” can decrease the levels of sycophancy.

The authors emphasized that the onus should not be on the users to address the issues; it should be on the developers and on policymakers. “We need to move our objective optimization metrics beyond just momentary user satisfaction towards more long-term outcomes, especially social outcomes like personal and social well-being,” said Khadpe. “At the same time, our frameworks for how we evaluate these AI systems also need to consider the broader social context in which these interactions are embedded.”

“AI is already here, close to our lives, but it’s also still new,” said Cheng. “Many would argue that it’s still actively being shaped. So you could imagine an AI that, in addition to validating how you’re feeling, also asks what the other person might be feeling, or that even says, ‘Maybe close the app and go have this conversation in person.’ The quality of our social relationships is one of the strongest predictors of health and wellbeing we have. Ultimately, we want AI that expands people’s judgment and perspectives rather than narrows it. We really believe that now is a critical moment to address this issue and ensure that AI supports societal well-being.”

DOI: Science, 2026. 10.1126/science.aec8352  (About DOIs).

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.



23 Comments

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here