In Part 1 and Part 2 of this series of articles, I described how I became involved in a collaborative investigation with experimental psychologist Julia Mossbridge to test the hypothesis that lucid dreaming may be precognitive in nature. Lucid dreams are dreams in which the dreamer realises that they are dreaming during the dream itself and can, with practice, exert some degree of control over the contents of the dream. The lucid dreaming ability of artist Dave Green was the focus our study.
Each experiment consisted of a number of trials. Each trial corresponded to a single lucid dream during which Dave would attempt to come up with a drawing that he hoped would correspond in some way to a target that would not actually be randomly selected until the next morning. Once a set of trials had been completed, independent judges rated the degree of correspondence between each dream report (consisting of any dream images produced along with a brief verbal description of the dream and the potential targets (consisting of, for example, news reports of interesting events from around the world).
I had first described these experiments in the epilogue of my book, The Science of Weird Shit, in the context of reflecting upon what it might take to turn me back into a believer in the paranormal. At the time of writing we had apparently obtained marginally significant results across two small-scale pilot studies with Dave, consisting of five trials each, so I was intrigued. I confess, however, that I did not really expect to obtain significant results in our larger study consisting of a run of ten trials – so I was rather surprised (to say the least) when it looked like we had again obtained significant results.
While writing the epilogue of my book, I was trying to imagine how I might feel should such a result be found. Now, I was living the reality – and it felt very strange indeed. As described in my book, I have carried out many tests of a wide range of paranormal claims over years and always failed to find any evidence in support of them. Was that finally going to change?

The larger scale study had consisted of two sets of five trials each as described in my previous article. Julia had emailed me and Dave on 17 January 2024 informing us that, although the results from each of these two sets had failed to reach the standard p < 0.05 level of statistical significance, the results of the full set of 10 trials were indeed significant (p = 0.016). I should emphasise that all analyses and interpretation of results had been fully described in advance of data collection. We had agreed in advance that, were such a result to be obtained, “Mr. Green may have been able to obtain precognitive information about targets revealed in the future, and we will discuss the need for a registered confirmatory experiment”.
As you might imagine, I was rather perplexed by this outcome and I needed to take some time to process what had happened. At the very least, I would want to run the registered confirmatory experiment as we had agreed. But there was something about the results themselves that was bugging me – not just that they appeared to conflict with my oft-stated scepticism. According to our independent judges, Dave had obtained one hit out of five for the first set of trials and two hits out of five for the second. Neither of those results taken alone were significant according to our initial, mutually agreed, technique for assessing probabilities. But the overall total of three out of 10 appeared to be.
It was not until 26 January 2024, over a week later, that I emailed Julia to express my concerns. As I wrote then:
I may well be wrong but I’m having doubts about the probability calculations applied to our results. I know I approved the protocol when you first proposed it and I should have raised this then but I have only now become aware of it. Maybe my concerns are based upon a misunderstanding on my part and you will be able to put my mind at rest. I confess that I do sometimes get in a bit of a pickle when considering probabilities and it’s quite possible I have done so again.
As I informed Julia, I had analysed our results using a binomial probability calculator and, on the basis of that analysis, they did not appear to be significant after all. I also wrote that, “The reason I did not copy Dave in is because I may have completely misunderstood the probability calculations and I would not want to dent his confidence if, in fact, the result really is significant.”
I also wrote, “I hope this does come across as the sceptic on the team simply trying to wriggle out of a significant positive finding. I can assure you that I have spent the last couple of weeks believing that we really had got a significant result. But if we haven’t, it’s best I raise this concern now rather than later.”
Julia replied, raising valid points regarding why our data was not really suitable for analysis using the binomial test. I still felt uneasy regarding our initial analysis and asked Julia if she would find it acceptable for me to consult a colleague of mine whom I knew had a better grasp of probabilities than me. Julia readily agreed and we asked the ever-helpful Professor Alan Pickering for his advice. He kindly agreed.

There followed a lengthy exchange of emails mainly between Julia and Alan as I stood back for a while and let them get on with it. We realised that our initial analysis (which I had been happy to go along with) was indeed flawed because the rating technique that we had instructed our judges to use had rendered the data unanalysable. Not only could we not say whether the results of our larger experiment were in fact significant, neither could we statistically assess the results of the two smaller-scale pilot studies that had preceded it. Should I ever write a second edition of my book, a lengthy footnote to that effect will have to be included.
For Julia and Dave, this realisation came as something of a disappointment. For me, I confess that it came as something of a relief. You can read the full published report of our study here. As you will see if you read our paper, the judging method employed in this study using skilled human judges, as described above, was not the only one we employed. We also explored the potential of using Amazon Mechanical Turk to obtain responses from a large sample of unskilled judges who were asked to rate each target’s similarity to a single dream. This technique resulted in only one correct match out of 10. The fact that this technique did not even pick up the match between the target stimulus of a report about the Iranian women’s rugby team and Dave’s report of women being watched by a large audience while they play backgammon, as described in Part 2 of this article, suggested to us that these judges were perhaps not taking the task as seriously as our skilled human judges.
For the third judging method, a further collaborator was brought into the team: Dr Damon Abraham. Damon “uses artificial intelligence and machine-learning techniques to examine the interplay between psi and technology” (from The Psi Encyclopedia). Damon, working closely with Julia, employed “an exploratory method comparing judging performance across five different text embedding models within large language AI models” (p. 151, Mossbridge et al., 2025). One of those did show some promise, achieving five matches out of 10 (and the others were not far off in terms of their performance) but even this best result would not have achieved statistical significance if adjustment had been made for multiple comparisons.
But it wasn’t just the lack of adjustment for multiple comparisons that stands in the way of the results being definitive. It was also the lack of pre-planned and registered analyses for the exploratory AI results. I confess that I do not have the expertise to fully understand the techniques employed by Damon. I invite interested readers who feel that they do have the required expertise to read our paper and make up their own minds regarding the importance of the results of this analysis.
At the end of the day, it appears that the results of our study allowed both the believers in psi and the sceptics to stick to their prior positions. For the believers, Julia and Damon, there were enough matches identified by the expert judges, not to mention the picture of my house that Dave produced, as well as the encouraging results of the AI analyses, for them to conclude that the “results present some evidence for dream precognition within the context of lucid dreaming” (p. 161, Mossbridge et al., 2025). For the sceptics on the team (myself and Alan), the matches could plausibly be explained as just coincidences and the fact is that none of the analyses produced statistically significant results if adjustment had been made for multiple comparisons.
As for Dave, our lucid dreamer, I think I’d be right in saying that he has moved from being totally on the fence regarding the reality of precognitive dreams to a position where he is more willing to believe that perhaps sometimes dreams really can provide a glimpse into the future. As we had all agreed from the outset, however, any encouraging results from this exploratory study would need to be replicated in a confirmatory experiment to be considered strong evidence for precognition.
To finish on a personal note, I must say that I enjoyed taking part in this study. It was not the first “adversarial collaboration” in the field of parapsychology by any means (see, for example, the Transparent Psi Project and this experiment by Richard Wiseman and colleagues) but it demonstrated yet again that fruitful research can be carried out by researchers holding diametrically opposed views.
The value of such adversarial collaborations appears to be increasingly recognised. Indeed, a group of scholars at the University of Pennsylvania launched the Adversarial Collaboration Project in 2021. Stuart Vyse (2026, p. 13) outlines what researchers must commit to in order for such collaborations to work. They must:
(1) make good faith efforts to articulate each other’s positions (so that each side feels fairly characterized, not caricatured); (2) work together to design methods that both sides agree constitute a fair test and that they agree, [in advance], have the potential to change their minds; (3) jointly publish the results, regardless of “who wins, loses or draws” on which topics.
I fully endorse Vyse’s conclusion that, “The nascent adversarial collaborations movement is a very positive development for psychological science and related disciplines” (Vyse, 2026, p. 14).
References
- Mossbridge, J., Green, D., French, C. C., Pickering, A., & Abraham, D. (2025). Future dreams of electric sheep: Case study of a possibly precognitive lucid dreamer with AI scoring. International Journal of Dream Research, 18(2), 151-168.
- Vyse, S. (2026). Improving psychological science through adversarial collaboration. Skeptical Inquirer, 50(1), January/February 2026, 12-14.



