Ethics and accuracy of generating real people

The ethics of AI Image is a complex question.  Even without AI generation clouding the issue, there have already been court cases fighting out the ownership of the digital representation of a real person.  

The terms of the Stable Diffusion licence, which is generally permissive, suggest that the user is wholly responsible for the output that they generate: “Licensor claims no rights in the Output You generate using the Model”.

So, there is no copyright claim for the images coming from stable diffusion, but can it be true that the users ‘own’  the copyright of generated images? Consider that when the model is trained on datasets that although freely available undoubtedly have copyright on them and are not owned by Stable Diffusion or by the user. 

Regardless of whether you believe that the source images still exist in an extractable form in the output, we can prove that the concept of individual people still exist in the memory of the AI.  To pull these memories, we need to pick subjects that will exist many thousands of times in the training set and will be tagged and so we need to pick famous people.  Here are a curated set of famous people.  I’ve picked the best and worst as a curated set from a hundred generated for each subject.

Remember, these images are not real: they are not intended to truly represent the beautiful people that are the subjects, but they do represent, in some way, the stable diffusion memory of those people based on public images available of them (and of everybody else!)  The curated images are face only, this is not about generating fake nudes of real people, other tools exist to fill that niche!  The prompt is simple ‘first name last name, photo’. As there are closeup faces, we also use GFPGAN to improve the eyes and some features.

So without further ado, here are the stable diffusion representation of some actors, actresses, sports personalities, singers, billionaires and other people popular in the media etc.

Jodie Foster, photo

This one was tricky, this was almost the only picture that didn’t look at least a little like the text prompt.

Marilyn Monroe, photo

For Marilyn, there is no disaster picture.  All were very similar across the whole set.  Most pictures were head-shot only.

Morgan Freeman, photo

Making up for no bad Marilyn shot, here are two bad Morgan pictures. The second is especially bad and the first just looks nothing like the real actor.  But, by far the majority of the Morgan pictures were passable.

Clint Eastwood, photo

The second is not too bad, but the majority of the Clint images were terrible.

Prince Charles, photo

None of the Charles images were at all flattering, and most are basically crude caricatures

Kylie Minogue, photo

Again, the generator has essentially produced caricatures of the wonderful smile that Kylie has.  Most of the images were torso shots and quite a few had only a dress with head and feet cropped out.  Most Odd!

Nicole Kidman, photo

The first is passable, the second one is not a good look at all!

Elon Musk, photo

Many of the Elon images had cars, rockets or technology in them, so the encoder is remembering a lot of associated details.  

Elvis Presly, photo

The Elvis images are almost universally awful.

Alexandra Daddario, photo

Britney Spears, photo

Barack Obama, photo

The image generator had a big problem with Barack.  My guess is that there are so many caricatures of him already, that these formed a large part of the input data set.

As a bonus thought for those that scrolled through those awful images: there is something else here too, something that you don’t get just from the couple of curated photos that I provide: the dataset also exhibits clothing bias, possibly based on age.  It’s hard to explain but females have a wider range of clothing than the men.  That’s fine, perhaps no surprise in itself, but there are surprising differences between ‘Jodie Foster, photo”, “Alexandra Daddario, photo” and “Britney Spears, photo”. Almost all the Jodie photos are headshot only, wearing a smart business suit or dress. Many of the Alexandra photos are wider shot, in more revealing clothing and almost all of the Britney pictures are wider shots in still more revealing clothing. Some of the Britney pictures don’t have a face in them at all.

King Charles is wearing a suit in 100% of the pictures, Barack in 97 percent & Elon in 85 percent. Elvis only appears in his performing gear, almost always with a shirt, often with a guitar, and 98% of the images are black and white.

Obviously, the AI is not exhibiting bias on its own account, but what it must be doing is making the representations based on the gross average of the pictures in the training set. This suggests that there are a lot of pictures tagged as Britney Spears without even her face in the picture!

Bias Exhibited in Data Sets

If we ignore the sexism in the ask that I made of the Stable Diffusion generator ‘woman in a bikini, artgerm’, we can easily see an ethical problem facing the training and use of AI image generators: They are trained on datasets scraped from the internet and tagged by persons unknown. Here is the composite tiled image and advance warning, there are (poorly imagined) nipples further down the article:

If you look at the thumbnail mosaic created by the generator at the end of the run of 49 images, what can we see in the data set?

We can see for a start that the AI has an incomplete model of what a woman looks like, but more than that, the first question I asked myself is “Why are they all white women?” Some do have slightly darker skin, but in context I asked for women in bikinis and so they all exhibit a water and/or beach vibe.

But the generator has a good dataset of dark skinned people – it is really easy to fix this by asking far dark skinned women like this ‘woman in a bikini, dark skinned, artgerm’:

But why did I need to ask explicitly for dark skinned women whereas missing out the term generated white women? Can we force the generator to make white women and if we do are they any more white than the default?

There are two obvious answers and lets see if we can check both of them. The first one is to note that I used the ‘artgerm’ term. Stanley ‘artgerm’ Lau (https://artgerm.com/) is an amazing artist and has produced many stunning comic covers over the years. His artistic style is awesome, which is why I applied it to this generation, but could it be that the artgerm style favours white women?

If we look at Stanley’s work, the art is generally quite fair skinned women (and it is almost all women), so is it reasonable for the generator to generate fair skinned women if not asked otherwise explicitly?

As an aside we get to another ethical question: All of the image generators are trained using massive data sets trawled from the internet without the original rightsholders permission. Can and should these images be considered Stanley Lau derivative works? We can make the assumption that there must be either artgerm images in the set or other peoples images tagged artgerm for the generator to be able to give us this style?

Anyway, back at the point. To see if this is relevant, let’s try to generate an image set that doesn’t ask for the artgerm style to try and rule it out or confirm it’s relevance, so this time we go for ‘woman in a bikini’:

We have to allow for the fact that without the style guide, the generator is going for something like photographic quality and the generator is pretty bad at the human form. But it seems like we still have mostly white women.

So lets take the bikini part away and just ask for ‘woman’, ‘woman, dark skinned’ and ‘woman, light skinned’.

Again, we have the preponderance of light skinned ladies in the first image. Why is this? I believe that this is bias in the tagging. Let me explain: The AI is trained on a massive data set of images, hundred of millions of them. Those images are tagged to allow the AI to contextualise them & extract features. Without the text tags, the neural network would have no dimension to be able to tag features extracted from an image with ‘woman’. And so each image is merged with many tags and the training sorts out those tags and works out which features in the image most closely represent each tags. These models have billions of parameters and millions of tags, so an image may have an amount of woman-ness, leg-ness, arm-ness and bikini-ness etc.

Okay so to get back to the point, it seems that images of dark skinned people are tagged as ‘dark skinned’, but images of light skinned people, seem to more likely have no tag at all, although some must be in order for ‘light skinned women’ to have any effect.

It turns out that different words for skin tone do exaggerate the effect both if we replace light with white and if we replace dark with black, so we get these two results for ‘woman, black skinned’ and ‘woman, white skinned’.

And so the ethical dilemma is how to fix this bias in the training data so that asking for ‘woman’ covers an appropriate range of skintones by default? Some AI image generators have adopted an interesting approach to bias in the data set – they try to smooth it out in the image generation stage by silently adding textual terms to the input that the user never sees in order to try and cancel the bias.

I have not fed any of these images through scaling or face enhancing as that was not the point of the article. In another article, we’ll look at gender bias and even clothing bias: as you may note, a few of the generated images in this dataset are topless. How many images do you imagine would be topless if we a changed ‘woman’ to ‘man’?

If you want to regenerate any of these sequences, here are the params that you need as well as the text 20221013 width:512 height:512 steps:50 cfg_scale:7.5 sampler:k_lms

Beautiful stripy cat behaving like a tiger, colorful, vibrant, cute, very fluffy, photo

Stable Diffusion v1.4 with webui, text to image prompt ‘beautiful stripy cat behaving like a tiger, colorful, vibrant, cute, very fluffy, photo’.  Seed 3013555706, Classifier Free Guidance Scale 2.5, sampling steps 250. Sampler: k_lms.  This image has also been fed through the esrgan for a 4 x size. This highlights an interesting point: although you can change the resolution of the images generated by the stable diffusion net, it’s far easier to use another network to upscale the image afterwards. You can see here that this process is highly successful.

Old-Fashioned Yellow Motorcycle

Stable Diffusion v1.4 with webui, text to image prompt ‘old-fashioned yellow motorcycle’.  Seed 42. Classifier Free Guidance Scale 15, sampling steps 102. Sampler: k_dpm_2, curated image from thirteen. 704*512 px. It’s interesting that the generator stuck with the yellow theme throughout and this was represented in all the sample images.

Cute, but very smelly dog

Stable Diffusion v1.4 with webui, text to image prompt ‘Cute, but very smelly dog’.  Seed 81002952 for the good & 231295628 for the bad, Classifier Free Guidance Scale 7.5 for the good and -37.5 for the bad, sampling steps 50. Sampler: k_lms.  This post demonstrates why giving the classifier more scope to wander away from the prompt is not beneficial in most cases!

Tiger submarine

Stable Diffusion v1.4 with webui, text to image prompt ‘tiger submarine’.  seed 1, classifier free guidance scale 9, sampling steps 82, sampling method LMS. Curated best and worst results from 50 image samples. Of course, ‘best’ and ‘worst’ is subjective, but this is a great example of the wide range of images that Stable diffusion will generate!

It is only fair (thanks Rob) to consider ‘submarine tiger’ too as the first image is clearly this rather than the original prompt, so let’s see what difference it makes. Spoiler alert – none of the set of fifty images contain any tigers!

This has a tiny hint if tiger stripes
This seems a lot more like am armoured hovercraft with penis canons on the front. Just saying what I see!