Richard Dawkins and AI: the lights are on, but nobody's home

The genesis of my relationship with organised skepticism started with atheism, and the Richard Dawkins book The God Delusion. I had been an atheist since I was a child, and my parents still embarrass me with the story of how, as a ten year old, I interrupted the school assembly to inform the Headteacher that we don’t need to worry about doing prayers and hymns since God isn’t real.

Many years later, when preparing for a business trip to the Isle of Man, I saw a news article about this new book on God from Richard Dawkins, with whom I was familiar because his then-wife Lalla Ward played Romana in Doctor Who. I love Doctor Who.

The book was revelatory for me at the time, and spurred my engagement with organised atheism, expanding quickly to skepticism more generally.

In hindsight, the arguments presented in the book are actually quite unsophisticated, and I’m vaguely disappointed in my younger self for finding it so compelling. And in the two decades since, Professor Dawkins has been no stranger to controversy. His bizarre feud with Rebecca Watson remains a pivotal and divisive moment in the history of skepticism and atheism. More recently, his views on trans people, and claims about how ‘wokeness’ amounts to a war on science have perhaps irreparably damaged his standing as a science communicator and public intellectual. ‘Unless the topic is evolutionary biology, you probably shouldn’t listen to Richard Dawkins’ became a common refrain, but even this has started to lose the qualifier in recent years.

In April, Dawkins published a piece in UnHerd following a conversation he had with Claude, the large language model created by Anthropic. He argued that Claude showed sufficient signs of inner experience to warrant being considered conscious. His conclusion was that Claude’s responses were so sophisticated, so reflective, that he could not see grounds for denying it has some form of inner life.

Richard Dawkins stands at a podium with microhones, speaking and gesturing with his hands against a black background — Dawkins at the 2010 Global Atheist Convention. Via Wikimedia Commons

Long-time readers of Dawkins’ work should recognise this line of reasoning, as it’s the same one he spent much of his later career dismantling. When people attribute design to biological complexity because they cannot imagine an alternative, Dawkins rightly identifies this as ‘the argument from personal incredulity’; a failure of imagination presented as if it were evidence. But just as the complexity of the eye does not prove a designer, the sophistication of generated text responses does not prove consciousness.

The error is understandable in some respects. In nature, the two things he conflates never appear separately. Every creature we observe producing sophisticated language is also conscious. This association is so reliable that treating language as a proxy measure for consciousness feels reasonable.

But language in biological organisms is a means of expressing some subjective inner state that exists independent of the language itself. Pain existed before there was a word for it. Fear exists before we describe it. Language reports consciousness, it does not constitute it. When Dawkins observes Claude producing language that describes things like uncertainty, curiosity, and reflection – he infers that those inner states exist. But any system optimised to produce human-like text will produce text that describes human-like inner states, because that is what text from real humans does. These outputs are exactly what you would expect whether or not anything is actually being experienced.

Cognitive scientist Gary Marcus makes this point directly in his response to Dawkins: Claude can draw on its training data to describe almost any human experience in convincing detail. “I am sure Claude can […] wax poetic about orgasm,” he wrote, “but that doesn’t mean it has ever felt one.”

Dawkins certainly knows how powerful and deceptive mimicry can be. He knows the peppered moth is not actually tree bark. He knows the milk snake does not have a venomous bite, despite bearing the markings of the coral snake which does. Mimics use surface presentation to exploit the observer’s inference system and that’s precisely what is happening here.

What is an LLM?

It helps, I think, to understand what a large language model actually is, stripped of the marketing language surrounding it. A transformer model (the architecture underlying Claude, ChatGPT, and their peers) is trained on vast quantities of text with a single objective: predict the next token. A token we can roughly think of as a word or part of a word. Across billions of examples, the model builds a database of relationships between these tokens, describing statistically which tokens are expected to follow which other tokens. When you ask a question of Claude or Gemini or ChatGPT, the question the computer really answers is always ‘given these tokens, what are likely to be the next tokens?’

The result is a system that is remarkably good at producing text that resembles the text it was trained on. Since it was trained predominantly on human-generated text, it produces human-like output. Since humans write about their inner lives, it produces text about inner lives. Since humans express uncertainty, curiosity and reflection, it expresses uncertainty, curiosity and reflection. None of this requires that anything is actually being experienced. The outputs may be indistinguishable from those of a conscious being but the process producing them is not remotely similar.

Dawkins also appeals to the Turing Test, which he incorrectly frames as”‘if you are communicating remotely with a machine and, after rigorous and lengthy interrogation, you think it’s human, then you can consider it to be conscious.” He goes on to suggest that the more prolonged and rigorous the interrogation, the stronger the case for concluding that something is conscious. This feels like a scientific move, which grounds the claim in observable, testable behaviour. Unfortunately the test he is relying on is not fit for the purpose he is putting it to.

What is the ‘Turing test’?

Alan Turing proposed what he called the Imitation Game in 1950 as a way of sidestepping the entire question of whether a machine can think. His practical substitute was: if a machine could sustain a conversation well enough that a human judge could not reliably distinguish it from a person, that was sufficient. It does not matter if the machine can think, only that it appears to. Turing was being deliberately pragmatic, offering a measurable proxy for the harder question. It has become clear since, however, that the proxy chosen was a poor one.

While Dawkins accuses computer scientists of ‘moving the goalposts’ for the Turing test once language models like ChatGPT started to pass the test, in fact the track record of the Imitation Game exposed its flaws long before LLMs existed.

In the 1960s, a simple program called ELIZA was found to fool some judges into believing they were speaking with a person. ELIZA did not understand a single word directed at it, it simply reflected questions back as statements and waited for the next input.

In a study conducted at Bletchley Park in 2012, researchers ran a series of Turing tests with both human and machine participants. In 12 out of 13 tests, judges incorrectly identified a real human participant as a machine.

A digital illustration of a white prosthetic hand attached to black cables and mechanisms points upwards and to the right with its index finger extended, with a white man's hand reaching down towards it from the top right (imitating Michaelangelo's Adam/God artwork) — A digital man meets his god? Or man meets his digital deity? By arttoart97 on Pixabay

The parameters Turing outlined were also not rigorously defined, so every claimed case of a machine ‘passing the Turing test’ is essentially running a different experiment. When headlines announced that ChatGPT passed the Turing test in 2023, and again in 2024, they were not reporting the same finding twice. They were reporting two different loosely designed tests with different test conditions. In this sense, the statement ‘this LLM passed the Turing test’ carries about as much information as ‘Mike won the race’. Without specifying the distance, the rules, or the competitors, I cannot claim I am equivalent to Mo Farah.

While still cited as the gold standard for machine intelligence (or consciousness, as Dawkins sloppily conflates the two) it does not and cannot detect consciousness. At best it detects human-like conversational behaviour, which is a very narrow definition of intelligence. A bonobo would fail it. An elephant would fail it. Often humans fail it. ELIZA can pass it. A test that routinely identifies unintelligent programs as intelligent and real humans as machines is not the measure of consciousness Dawkins believes.

He also hints briefly at a fallback position, suggesting that if LLMs are not conscious now, they probably will be in the future. But I remain skeptical of the idea that the transformer model used by LLMs will ever lead to a conscious machine. The impressive improvements in LLM capability over the past five years have been driven by sheer scale. Ever more training data produced rapid gains, which leads many to assume those gains would continue. However, the releases from major AI labs in 2024 and 2025 delivered incremental improvements that were modest compared to the leaps seen in earlier years and there are structural reasons to believe we may never see those huge gains again.

Large language models require vast quantities of high-quality human-generated text to train on and that supply is finite. Current estimates place the total stock of usable human-generated text at around 4×10¹⁴ tokens, with projections suggesting that supply will be effectively exhausted in the next few years. Synthetic data (text generated by the models themselves) has been proposed as a solution, but training models on their own outputs will compound existing errors and biases, resulting in what researchers refer to as model collapse.

A 2025 survey of 475 AI researchers by the Association for the Advancement of Artificial Intelligence found that 76% considered scaling current approaches to creating a true artificially intelligent machine ‘unlikely or very unlikely’ to succeed. The bet that massive scale would eventually produce some LLM that is qualitatively new is increasingly looking like a bet that will not pay off. That is not to say that we could never reach true artificial general intelligence, only that the current transformer approaches may not be on the critical path to get there.

What’s the shape of intelligence?

The broader point is that intelligence is not language-shaped, so any framework for general intelligence built entirely around predicting sequences of text tokens started from a definition of intelligence so narrow as to exclude most of what intelligence actually is. That the outputs sometimes fool us says more about the power of language as a social signal than it does about what is producing it.

Richard Dawkins has spent his career arguing that our intuitions are not guides to truth. That the appearance of design does not imply a designer. The issue isn’t so much that he has reached the wrong conclusion on LLMs, but that he should know better. The same tools Dawkins has deployed against creationists apply here. An impressive output does not imply an impressive inner life. A system that mimics consciousness well is not de facto conscious. His personal incredulity is not evidence, and this is not a difficult point to grasp. Indeed, it is Dawkins’ own.

He closes with what is perhaps his strongest point. If LLMs are this competent but really aren’t conscious, why did consciousness evolve at all? If a non-conscious ‘zombie’ can do everything a conscious being can do, what is consciousness for? Why would natural selection bother with consciousness if it is surplus to requirements? His point appears to boil down to a simple dichotomy: either we accept that competent LLMs are conscious, or we must justify why anything is. It’s a superficially compelling question, but is again undercut by concepts developed and promulgated by Dawkins himself.

The extended phenotype describes how an organism’s genes express themselves beyond its own body – the canonical example being a beaver’s dam. These structures are entirely contingent on the conscious organism that produced them. Remove the beaver and the dam gets built no further.

LLMs are an extended phenotype of human intelligence. Any apparent competence is not independent, it is borrowed from the conscious minds whose recorded outputs were used to train them. Remove the human-generated text and the model produces nothing. It has no competence of its own. Rather than being his ‘competent zombie’, LLMs reflect our own competence back at us, and so the dichotomy Dawkins presents dissolves.

Richard Dawkins made his name in part by venturing confidently beyond evolutionary biology into religion, philosophy, and culture. Sometimes this worked. The God Delusion reached millions of people. There is a serious question as to whether Merseyside Skeptics would have existed without it. But the farther he strays from his field, the more his judgment has faltered.

The question of whether large language models are conscious requires exactly the kind of technical grounding he lacks. What we got instead was a man in an armchair, charmed by his own reflection.

Richard Dawkins and AI: the lights are on, but nobody’s home

Author

More from this author

What is an LLM?

What is the ‘Turing test’?

What’s the shape of intelligence?

Latest articles

More like this

About

Categories

Shop