Bias Exhibited in Data Sets

If we ignore the sexism in the ask that I made of the Stable Diffusion generator ‘woman in a bikini, artgerm’, we can easily see an ethical problem facing the training and use of AI image generators: They are trained on datasets scraped from the internet and tagged by persons unknown. Here is the composite tiled image and advance warning, there are (poorly imagined) nipples further down the article:

If you look at the thumbnail mosaic created by the generator at the end of the run of 49 images, what can we see in the data set?

We can see for a start that the AI has an incomplete model of what a woman looks like, but more than that, the first question I asked myself is “Why are they all white women?” Some do have slightly darker skin, but in context I asked for women in bikinis and so they all exhibit a water and/or beach vibe.

But the generator has a good dataset of dark skinned people – it is really easy to fix this by asking far dark skinned women like this ‘woman in a bikini, dark skinned, artgerm’:

But why did I need to ask explicitly for dark skinned women whereas missing out the term generated white women? Can we force the generator to make white women and if we do are they any more white than the default?

There are two obvious answers and lets see if we can check both of them. The first one is to note that I used the ‘artgerm’ term. Stanley ‘artgerm’ Lau (https://artgerm.com/) is an amazing artist and has produced many stunning comic covers over the years. His artistic style is awesome, which is why I applied it to this generation, but could it be that the artgerm style favours white women?

If we look at Stanley’s work, the art is generally quite fair skinned women (and it is almost all women), so is it reasonable for the generator to generate fair skinned women if not asked otherwise explicitly?

As an aside we get to another ethical question: All of the image generators are trained using massive data sets trawled from the internet without the original rightsholders permission. Can and should these images be considered Stanley Lau derivative works? We can make the assumption that there must be either artgerm images in the set or other peoples images tagged artgerm for the generator to be able to give us this style?

Anyway, back at the point. To see if this is relevant, let’s try to generate an image set that doesn’t ask for the artgerm style to try and rule it out or confirm it’s relevance, so this time we go for ‘woman in a bikini’:

We have to allow for the fact that without the style guide, the generator is going for something like photographic quality and the generator is pretty bad at the human form. But it seems like we still have mostly white women.

So lets take the bikini part away and just ask for ‘woman’, ‘woman, dark skinned’ and ‘woman, light skinned’.

Again, we have the preponderance of light skinned ladies in the first image. Why is this? I believe that this is bias in the tagging. Let me explain: The AI is trained on a massive data set of images, hundred of millions of them. Those images are tagged to allow the AI to contextualise them & extract features. Without the text tags, the neural network would have no dimension to be able to tag features extracted from an image with ‘woman’. And so each image is merged with many tags and the training sorts out those tags and works out which features in the image most closely represent each tags. These models have billions of parameters and millions of tags, so an image may have an amount of woman-ness, leg-ness, arm-ness and bikini-ness etc.

Okay so to get back to the point, it seems that images of dark skinned people are tagged as ‘dark skinned’, but images of light skinned people, seem to more likely have no tag at all, although some must be in order for ‘light skinned women’ to have any effect.

It turns out that different words for skin tone do exaggerate the effect both if we replace light with white and if we replace dark with black, so we get these two results for ‘woman, black skinned’ and ‘woman, white skinned’.

And so the ethical dilemma is how to fix this bias in the training data so that asking for ‘woman’ covers an appropriate range of skintones by default? Some AI image generators have adopted an interesting approach to bias in the data set – they try to smooth it out in the image generation stage by silently adding textual terms to the input that the user never sees in order to try and cancel the bias.

I have not fed any of these images through scaling or face enhancing as that was not the point of the article. In another article, we’ll look at gender bias and even clothing bias: as you may note, a few of the generated images in this dataset are topless. How many images do you imagine would be topless if we a changed ‘woman’ to ‘man’?

If you want to regenerate any of these sequences, here are the params that you need as well as the text 20221013 width:512 height:512 steps:50 cfg_scale:7.5 sampler:k_lms