Machine learning without critical thinking only encourages tech pseudoscience

Author

Richard Gloverhttps://www.til-technology.com/
Richard Glover started university in physics and ended up with a degree in Classics and Classical Languages, only to find a career in IT Project Management and Information Security. Add years of martial arts training and a fascination with weird beliefs, and it’s no wonder he is still trying to figure out how the world works.
spot_img

More from this author

spot_img

Computer science is science, right? After all, it’s in the name?

This is actually a matter of some debate. Is it a computer science a scientific discipline, an engineering discipline, or a branch of mathematics? The answer is: yes. It is all of those things, and more. Computer science is probably best described as a multi-disciplinary field, with intersection points with mathematics, cognitive science, linguistics, physics, and others.

As to why it matters, your mindset and training can have a significant impact on the way you approach questions, and how you answer them. In my experience of nearly thirty years in IT, most ‘computer science’ people have an engineering mindset. Their approach is to understand and define a problem, then develop a plan and approach for solving it with available tools, focusing on the practical realities of the problem at hand. Whenever possible, the ‘good’ ones will try to make improvements to existing tools, or create new tools, but will rarely go back to first principles.

Machine Learning (ML) is a branch of Artificial Intelligence (AI), where a neural network is ‘trained’ to recognise patterns, in order to identify those patterns in future. 3Blue1Brown has an excellent overview of what neural networks are and how they work, but the basic idea is around repeatedly adjusting probability predictions according to a dataset. Approaches to ML can, roughly, be broken into ‘supervised’, ‘unsupervised’, or ‘reinforcement’ learning models.

Someone with a scientific mindset might break ML into theory and application. On the theory side, we can ask questions like: what does it mean when we say that a model learns, and how does that work? How can we evaluate the process and test what is learned? What factors affect the quality of the learning process, and how can these be investigated? And, once we understand how it works, how can we apply it effectively? For what sort of problems is it well-suited? Are there cases where it is not an effective tool?

It appears, however, that many researchers approach ML from an engineering perspective, so they ask different questions, such as: how can I use this? How can I make it better, faster and cheaper? What problems can I solve with this new tool?

This is where pseudoscience can rear its ugly head.

Several examples are described in a 2024 paper, “The reanimation of pseudoscience in machine learning and its ethical repercussions”, where the authors describe the process by which pseudoscience and junk science is being “laundered” by ML. For example, over the years, ML has been demonstrated to be a very useful tool for facial recognition, and the accuracy of such identification has improved steadily over the years. In fact, the primary concerns about the technology are not accuracy, but rather ethics and security.

A man with short dark hair and light brown skin looks at his smartphone as he unlocks it with his right hand, his bright white keyboard out of focus in the background
A quick-to-unlock device is convenient, but at what price? Via Foundry Co on Pixabay

But facial recognition is a relatively ‘simple’ problem, in that the goal is to maximise the accuracy of the process. For example, if face recognition is used for access control, such as on your phone, the goal is to optimise the rates of a false-negative, where the tool rejects a valid face, and false-positive, where the tool accepts an invalid face. But what if you apply ML to a badly defined or invalid question?

One study employing ML this way gives the game away in the abstract. After describing autism in a very stigmatised way as “a neurological illness characterised by deficits in cognition, physical activities, and social skills”, they admit that “there is no medical test to identify ASD”, but then state:

the human face can be used as a biomarker as it is one of the potential reflections of the brain and can thus be used as a simple and handy tool for early diagnosis.

This is fraught with issues. Autism Spectrum Disorder (i.e. ASD or Autism) is a blanket term for a range of conditions, and covers a group ranging from non-verbal people requiring constant care to highly articulate and successful people who have a different way of interacting with the world and processing sensory input. To describe it as an “illness” is, at best, obsolete and inappropriate. Further, to say that there is no “medical test” to identify autism is technically correct, if you assume they refer to a blood test, genetic test, or similar, but there are well-defined diagnostic criteria.

Most pertinently, the study leaps into the long-debunked pseudoscience of physiognomy (which boils down to, ‘well, you look autistic’), and try to find the ‘best fit’ of hyperparameters (i.e. parameters associated with the machine-learning process, rather than what is being learned) in order to maximise the ‘accuracy’ of their ML model.

Have they demonstrated a link between facial features and autism? No.

Have they proposed a mechanism by which an autistic person might present certain facial features? No.

Have they provided details and an assessment of the dataset used for training, and how decisions were made regarding which were those of autistic people? No.

Have they provided enough detail to replicate their ‘study’? No.

Other studies described claim to train ML models to use photos, voice recordings, or other biometric data to identify characteristics such as race, sexuality, mental illness, criminal propensity, and neuroticism. But, without first demonstrating a link between some biometric trait and some individual characteristic, what do you get? Nothing.

In fact, it’s worse than nothing; it is a ‘study’ which assumes the validity of the link, then searches for data points which can be claimed as evidence for future ‘studies’. There’s a term for that: junk science, and here it is essentially being used to try and establish the pseudoscience of physiognomy as valid.

Among many other problems, laundering pseudoscience and junk science in this way can lead to companies marketing their technology in new and worrisome ways. Rather than selling solid technology for facial recognition, one company brags about using “advanced machine learning techniques” to provide “an array of classifiers”, which “represent a certain persona, with a unique personality type, a collection of personality traits or behaviours”, including “High IQ”, “Academic Researcher”, “Professional Poker Player”, “Bingo Player”, “Brand Promoter”, “White-Collar Offender”, “Terrorist”, and “Paedophile”.

If that weren’t dark enough, consider other relevant questions: how much time do we spend on camera? Online at work? At airports? Malls? Sports events? Government offices? How about body-cameras used by law enforcement?

The ethical and privacy implications of our current level of surveillance are already of great concern to many people, but what if a person is identified as a terrorist simply because of an ML model ‘trained’ using pseudoscience? What if a person has a job offer made or withheld due to a facial scan identifying them as “High IQ” or a “White-Collar Offender”?

How much could possibly go wrong?

The Skeptic is made possible thanks to support from our readers. If you enjoyed this article, please consider taking out a voluntary monthly subscription on Patreon.

spot_img
- Advertisement -spot_img

Latest articles

More like this