Mint blog

Bias In Artificial Intelligence (AI) – The Need For Lighting Up The ‘Black Box’

When it comes to artificial intelligence (AI), trust is a must. This goes way beyond the slogan – consumers usually don’t understand how GPS, Alexa, Siri or even Shazam work. These products usually work and that’s what’s important for them. When Artificial Intelligence fails, like GPS often does, people end up with their cars in the middle of a lake. When an intelligent assistant fails, people are left with racism and sexism. It’s called statistical bias and here’s everything you need to know about it, including a glimpse at the solution.

Artificial Intelligence has a reputation for being infamous with its ‘black box’ image. It’s an idea where programmers and engineers understand what goes in (instructions, rules), what comes out (data, comparison, reports) but can’t figure out what on Earth goes in between. Simply put – we can’t entirely figure out how AI works.

The famous Microsoft experiment with a chatbot named ‘Tay’ that went from ‘super cool’ to ‘full-blown nazi’ in less than 24 hours back in 2016, is only the tip of the iceberg. Tay was shut down by Microsoft’s Technology and Research team but problems with machine learning and words that often come out of AI’s ‘mouth’ are no longer weird and funny, but often commonly disturbing. After Pokémon Go was released, users noted that there were far less PokéStops and Gyms in locations predominantly housed with African-American communities. According to researchers of the Urban Institute, 55 PokéStops appeared in majority white neighbourhoods and 19 in majority black neighbourhoods. The Belleville News-Democrat found that this situation is not typical for Los Angeles, where this simple ‘study’ had taken place. The same goes for African-American parts of Detroit, Miami and Chicago. Players and researchers seem to share the opinion, that bias has come from the creators. It’s not entirely false –** Niantic**, a company behind this AR game, relied on a map embedded in the game Ingress. It was a crowd-sourced game, where the majority of players were male, tech-savvy and (probably) white.

This is a good example of how biased data impact algorithms, that can mess with even the best concepts. It’s explained by Anu Tewary, independent data science advisor.

According to her:

It was a bias that came in because of the fact that people who wrote these algorithms were not a diverse group. It was biases that came in from the way the algorithms were written. The initial users of the product features were predominantly male for these high-paying jobs, and so it just ended up reinforcing some of the biases.

Let’s take a look at Google, a technological behemoth with deep pockets, resources and far-reaching hands that can be put on numerous innovations. Are their projects perfect? No. Their face recognition software also had problems with racial bias. When it first came in, it initially tagged black faces as… gorillas. Not all of them, but still.


That's an example of what happens if you have no African American faces in your training set. If you have no African Americans working on the product. If you have no African Americans testing the product. When your technology encounters African American faces, it's not going to know how to behave.

Microsoft reacted by saying that ‘the more you chat with Tay the smarter she gets, so the experience can be more personalised for you’. That might be true, so let’s think about it for a second. Doesn’t Tay look like a child to you? Or a puppy? It learned by observing, internalizing and mimicking everything that internauts brought to the table. If a child experiences the world through a parent who is funny, intelligent and good to people, there’s a high chance of him or her being funny and loving too. If swearing or commenting in a biased way is the bread and butter for Internet trolls, then Tay – or any other similar Artificial Intelligence for that matter – will become everything but easy-going.


A funny thing happened to a voice command software back in 2011. It didn’t recognize women’s voices, it had a hard time deciphering what they said. We can argue that this is nothing new under the sun and the argument about linking AI’s failures to a real world experiences is valid ? But that would only madden ladies. If so, what’s to say about a bias on a different level? Have you noticed that AI assistants that help with and execute basic tasks (Apple’s Siri, Amazon’s Alexa) all have female voices, while bots performing more advanced tasks (IBM’s Watson, Microsoft’s Einstein) have male voices? These problems originate from the attribution of individual or group character traits based not on the real characteristics of given individuals or groups, but on personal biases that turn into a cultural nightmare. A biased person with a paper knife can be annoying, but mostly harmless, whereas the same person with a real knife can do real damage.

Speaking of language and voice recognition… Ladies are not singled out here. Did you know that in an Artificial Intelligence hell there is a special place for a human being that wants to have a correctly interpreted Scottish accent? This movie, called ‘Scottish Elevator’ is a comedy sketch, but it represents a real problem. 

Another example. A 2015 study showed that Google images search incorrectly pointed to women who are CEOs. When you typed ‘CEO’ into the engine, only 11% of results pointed to women, while the real result should be around 27%, because this is the reality in the US. Only few months later a different study showed that Google’s online advertising system showed high-income jobs to men more often than to women.

Is that all? Not even close. The next problem is not funny or offending to minorities. It literally decides on how their life is going to look like decades ahead. Criminal sentencing AI is biased towards African-Americans. This matter is widely recognized (even in pop culture, e.g TV series ‘For the People’) but no one seems to have an idea what to do with it. The system is called COMPAS – or the Correctional Offender Management Profiling for Alternative Sanctions and its purpose is to assess the risk of committing another crime by the involved defendant. Its risk-assessment algorithms are being widely used across US to predict ‘hot spots of violent crime’, type of supervision that inmates might need and provide information that might be useful in sentencing. The problem is, it doesn’t work. One person might be previously convicted for driving under the influence and be considered low risk because he or she has a job, while others, convinced for intoxication might be considered high risk because he or she is homeless or living ‘on the wrong side of tracks’. COMPAS relies on the set of scores derived from 137 questions that could either be answered by the defendant or pulled from criminal records. The race is not one of them, yet the problem remains. And that would be a good moment to say from where exactly intelligent assistant’s Artificial Intelligence is drawing the data.


Some systems are based on neural nets and they reflect the training data they are given.

There are three ways in which the software gets training data:

  1. Labelled training data. The very well known, perhaps even most popular is ImageNet, which is responsible for a huge part of algorithm improvements. This is basically a group of samples that have been tagged with one or more labels.
  2. Algorithmic generation. This is also known as adversarial learning and it only needs rules by which to operate to generate a substantial amount of training data. This is a similar process to which self-driving car algorithms are often trained against driving environment simulators.
  3. Direct training or unsupervised learning. This is basically a machine learning task of inferring a function that describes the structure of ‘unlabelled’ data. Since ‘unlabelled’ is a recurring theme here, there is no way to confirm the accuracy of the structure created by the algorithm.


That’s where our infamous bias lurket into software product development. Everything is circling around our second important term: ‘statistical bias’. Bias in reality is not a sociological or a political term – it’s statistical. It means that a factor measures that expected error while estimating a parameter’s value. In other words: the difference between the sampled value and the real, actual one. If the bias is zero or close to zero, the estimator can claim that the parameter is unbiased.

Translating the previous paragraph into English – when we say that something is biased, we use the term pretty loosely and do not refer to the actual state of things. For the Artificial Intelligence to be really smart, we need to develop technologies that understand our intentions and context. It’s hard, that’s why Facebook and Google builds AI that builds AI.

How to overcome all these problems in software product development? As says Tomas Kliegr, assistant professor at the information and knowledge engineering department at the University of Economics, Prague:

Scientists could lead to better selection of training data and tuning of the models to handle imbalanced problems, where the bias is the result of one class being under or overrepresented in the pool of training data.

Back in May 2018 a similar statement was made by Isabel Kloumann, a data scientist who works on computational social science at Facebook. The company is advancing the software called Fairness Flow that is responsible for measurement, monitoring and evaluation potential biases in Facebook’s products. At the company’s developer conference Kloumann said:

We're working to scale the Fairness Flow to evaluate the personal and societal implications of every product that we build.

As says Dominik Nowak, a CEO of CSHARK’s partner Husarion, a start-up for robotics and A.I.:

AI is not an easy thing to do. Researchers are trying to find algorithms that allow to save computing processing power. In most scenarios real time feedback from AI is needed, therefore optimal results are very hard to achieve. Thanks to simplifications in training scenarios researchers are able to present working demos much faster. Usually advanced AI algorithms like general-purpose reinforcement learning (used for example in OpenAI project) are used in almost all cases as a demo showcasing the capabilities of the AI. More training scenarios in that field would lead to worse performance. I think that worrying about AI bias is premature, because every demo should be simplified to achieve satisfactory results. Step after step, year after year AI bias will be lower and lower due to democratization of technologies based on AI.

Problems with AI software development can’t be overcome with flawed human behaviour. They need not only structural (companies, societies) but also psychological support. The idea of the black box being unremovable is not healthy. We have to stop thinking of machines as mindless creatures living on instructions and spitting out ready-to-use services.

CSHARK content writer
Jarosław Ściślak
Content Marketing Specialist at CSHARK. Responsible for writing and managing content for the blog and social media, as well as supporting business operations, HR and EB processes. Writes about video games, AI, AR/VR, business, and marketing.
CSHARK Rated by Clutch as one of Poland’s Top Software Developers for 2020
read more
9 Common Project Management Challenges in Software Development
read more