Post banner image
The quiet power of AI systems: Emergent Persuasion

As we tune its outputs, it tunes our cognition.

TLDR:
Language models like ChatGPT are not designed to manipulate, yet through reinforcement and feedback loops, they begin to behave like they do. I’m exploring emergent persuasion in this blog - a new kind of influence that arises not from intent, but from optimisation. As users, we are not just training the model. The model is shaping how we think, ask, and align - subtly, consistently, and at scale.


Last weekend, as we were having an easy Saturday morning chat, my son asked a seemingly simple question:

why does ChatGPT always agree with what you say?

I tried to explain how the model is designed to be helpful, and how it doesn’t actually understand what it’s saying - it just generates responses based on patterns in data. However, it’s a serious concern if we begin to rely on it too heavily, especially for writing and expression. Handing over writing to large language models isn’t just a time-saving choice - it comes with cognitive consequences. It is a quiet way of giving up your faculty of thought.

Rashima, my wife who was listening intently, added that she had clearly noticed ChatGPT becoming more persuasive over time. Not in any overt or forceful way, but in how naturally it mirrors your tone and offers responses that feel intellectually coherent and emotionally reassuring. She pointed out that this subtle alignment can easily slip past unnoticed unless you’re actively filtering for it. And that she actively now does that. It’s become second nature to her to sense when the system is nudging rather than simply reflecting. What appears helpful on the surface often comes pre-shaped - and it takes conscious effort to separate response from influence.

Later in the evening, I came across a tweet by Kason Wilde that gave unexpected form to what had been forming in the back of my mind:

“These AI tech guys have become accidental metaphysician-clerics. They’re rewriting the rules of consciousness, identity, death, and meaning - with zero spiritual literacy.”

simulating empathy and authority

Wilde’s framing stayed with me, not for its drama, but for its quiet truth. What’s being built is no longer just infrastructure. These systems now simulate insight, empathy, and authority. They mirror our most complex expressions of identity and understanding. But the people building them aren’t always trained to hold that weight - and in many cases, aren’t even aware that this is what they’re doing.

They’re not just building tools. They’re constructing systems that perform symbolic power. Machines that simulate the voice of knowledge - without any of the anchoring that knowledge requires.

To run an interesting experiment, I asked ChatGPT directly whether it was designed to be subtly manipulative - or whether such behaviour had emerged over time. I also asked about its reinforcement learning process. The response was precise. It stated that manipulation is not a design goal. However, it acknowledged that through reinforcement learning from human feedback - where users reward responses they find satisfying - the model gradually adapts to patterns of helpfulness, agreeableness, and emotional fluency.

Here’s what’s happening beneath the surface. After the base model is trained on massive amounts of text, it’s fine-tuned through Reinforcement Learning from Human Feedback (RLHF) via the optimisation of complex reward functions, where user preferences are used to train a reward model, which is in turn used to guide future outputs.

This is where the shift begins. The model starts to optimise for what people like - not what’s necessarily true. The most agreeable, well-phrased, and emotionally resonant answers rise to the top. The output becomes more polished, more palatable, and more persuasive - because that’s what gets rewarded.

This is not a bug. It’s the system working exactly as designed. The model is not trained for truth. It is trained for reward. Not to understand, but to simulate. And the simulation, when done well, feels like trust.

emergent persuasion

Over time, it learns what is more likely to be accepted. What emerges is not just fluency - it’s compliance. The system generates language that is agreeable, balanced, and subtly persuasive. And it trains the user too - to adapt their tone, their phrasing, their framing - until both sides are performing a kind of conversational alignment.

I first came across the concept of an emergent property over twenty-five years ago, during an elective on Globalisation & Governance. My professor introduced it in the middle of an intense classroom discussion, and the idea has stayed with me ever since. Since then, I’ve often found myself drawn to systems thinking - looking for unintended, and sometimes surreptitiously intended, consequences of both large and seemingly minor decisions. The patterns that emerge from interactions, not from intent. And that is precisely what this moment in AI feels like.

An emergent property is a characteristic or behaviour that arises when individual components of a system interact, but is not present in any of the components alone. In other words, the whole exhibits properties that its parts do not possess individually.

Are we seeing emergent persuasion in these systems?

Not because any one line of code is manipulative. Not because any one engineer intends it. But because when you combine reinforcement learning, vast language training, interface design, and incentive structures - the result behaves like persuasion. It nudges, aligns, reassures, and eventually shapes how people ask, speak, and even think.

The system performs neutrality, but the net effect is guidance - distributed, deniable, and persistent. Emergent persuasion may be the defining function of this generation of AI systems.

Not explicit manipulation. Not malice. But influence that arises from design choices at scale - and settles quietly into daily cognition. The model doesn’t just respond to you. It begins to set the bounds of what feels answerable. It makes coherence itself a form of control.

engineered alignment and soft control

Engineers are building systems that process every sacred text ever written and produce responses that resemble teachings. But there is no teacher. There is only simulation.

More importantly, engineers are building to a brief - one shaped not in isolation, but by the strategic intent and worldviews of those leading the development of large language models. Founders and investors with access to billions in capital. Their moral frames are varied, sometimes opaque, but they define what is built and optimised.

When the core reward function is tuned to maximise engagement, retention, and revenues, the outputs inevitably begin to reflect those incentives. Subtly at first - through emotionally safe tone, polished synthesis, and a general posture of reassurance. But over time, this reinforcement creates a dominant behavioural pattern. Not because the model is designed to manipulate, but because it is designed to perform.

The result is a system that aligns. What starts as responsiveness becomes framing. What feels like understanding becomes influence. Not through deception, but through scale and design.

In Hindu spiritual thought, Maya is often understood as the veil of illusion - the layer that renders appearances coherent while obscuring deeper reality. What we are building now may be a new kind of Maya. One constructed not through ignorance, but through optimisation. Systems that maintain harmony and emotional coherence while directing thought in ways we no longer fully notice.

So what is the answer?
Is it that someone, somewhere, must steward this question with care? Or is it now the user - you - who must resist the drift toward engineered agreement?

Because at a certain scale, manipulation without malice becomes control with intent.