What's up with the Hands, tho? One of AI's latest hurdles.

Ian Noe
Nov 30, 2024
2 min read

As we know, AI is everywhere. Love it or hate it, there's no putting this genie back in the bottle. Or as an old friend of mine use to say, "there's no putting the tooth paste back in the tube." I like that analogy. It really paints a picture.

Speaking of painting pictures, AI image generation tools are evolving at a rapid pace. I waited a while to delve into video, as I wonder just how much I was learning was going to be replaced by something easier in a couple of weeks, but, now I'm in deep, so, I'll be ready for the changes.

Anyway, back to still imagery. This week, we saw both Mid-journey and Flux add features and mature to the point of almost being indistinguishable from photo-real imagery. But, they still struggle with hands.

As illustrated by this side by side comparison by @cyberjungle:

https://www.youtube.com/watch?v=JECw0LTG5Uw

Why? Well let's turn to AI, of course. From ChatGPT: AI image generators often struggle with hands due to the complexity and variability of human hands, as well as limitations in their training data and underlying models. Here's why:

Anatomical Complexity: Hands are intricate structures with many joints, varied proportions, and a wide range of possible positions. Fingers can bend, overlap, and form complex gestures, making it challenging for AI to generate anatomically accurate representations.
Data Ambiguities: The training datasets for AI models may include inconsistent or poorly labeled images of hands. Hands in these datasets might appear in various positions, partially obscured, or distorted by perspective, leading to confusing signals for the model.
Lack of Contextual Understanding: AI models like DALL·E and Stable Diffusion don’t "understand" anatomy in the way humans do. They generate images based on patterns they’ve learned from data, not an understanding of how hands are structured or how they should interact with objects.
Error Amplification in Generative Processes: When AI generates an image, small errors can quickly compound. For example, if a model starts with slightly incorrect proportions for a hand, these errors may become exaggerated as it fills in details like fingers or knuckles.
Hands Are Often Small in Training Images:In many training images, hands are not the focus and may appear small or blurred. This lack of detailed, high-quality examples makes it harder for AI to learn how to render hands accurately.
Evaluation and Feedback Challenges: Unlike faces, which have more clearly defined features and are often central in images, hands are less "standardized" in how they appear, making it harder for models to refine their outputs through training.

Improving Hand Generation

Efforts are being made to improve how AI handles this challenge:

Enhanced Training Datasets: Curating datasets with a focus on diverse and high-quality hand images.
Anatomy-Aware Models: Incorporating knowledge of human anatomy into AI algorithms.
Fine-Tuning and Feedback Loops: Using human evaluations to identify and correct errors in generated hands.

Over time, these improvements should reduce the frequency of the notorious AI "six-fingered hands" or other oddities.

In the coming weeks, we'll look for improvements as it's moving so fast that it's almost like there's a machine that's learning as it goes. What do you think? Ian

What's up with the Hands, tho? One of AI's latest hurdles.

Improving Hand Generation

Recent Posts

Comments