Lab notes: You Are Doing AI Images Wrong
Systems for Storytellers / 05
Introduction
To be clear, there is no wrong way to make AI art. However, more often than not, people rely on the simplest approach available and settle for whatever comes out. And that is perfectly fine—if the goal is simply to “create an image with AI.” The novelty of writing a few sentences and watching an image appear that resembles your instructions is undeniable. It feels almost magical the first time you do it, and even after hundreds of generations, there is still a certain thrill in seeing the AI interpret your words visually. But if the goal is to create a specific image—something with a particular composition, a particular character, a particular mood—then “close enough” might not be “good enough.”
The gap between a casual AI-generated image and one that looks exactly the way you imagined it is not about talent or artistic skill in the traditional sense. It is about process. The people who consistently produce stunning, precisely controlled AI images are not using some secret model that the rest of us lack access to. They are simply approaching the problem differently—with more preparation, more iteration, and more tools in their workflow.
In this article, I will outline four distinct levels of AI image creation techniques, from Beginner to Professional. That said, the Beginner approach is not inherently inferior to the Professional one. The only real difference is how you approach each problem. Each higher level builds on everything from the levels below it, adding just a few extra workflows. You do not need to jump straight to the Professional level to get great results—but understanding all four levels will help you recognize when a more involved approach is worth the effort.
Beginner: Text Prompts and Regeneration
The simplest way to create an AI image is to write a text prompt describing exactly what the image should contain. The more detail you include in the prompt, the more precise the result will be. Nearly all AI image generation models accept text as input, and the output will vary depending on the model—but it will always be roughly close to the desired idea.
The issue, of course, is that “roughly close” part. You might get lucky and produce the exact image you envisioned on the first try, or the result might be “good enough.” But if neither is the case, the simplest fix is to just generate again—and again, and again—until either your credits or your patience runs out. Eventually, you will end up with the “best of the bunch,” and that becomes your final image.
There is nothing wrong with this approach, and a few practical habits can make it more effective. First, be specific. Instead of “a woman in a forest,” try describing the lighting, the time of day, what she is wearing, where she is looking, and what the forest looks like. Second, pay attention to how different models interpret language. Some models respond better to comma-separated tags, while others prefer natural sentences. Third, experiment with style keywords. Phrases like “cinematic lighting,” “shallow depth of field,” or “35mm film photography” can dramatically shift the mood and quality of the output.
Even at this level, the key insight is that prompt writing is a skill. The more you practice and study what works, the fewer regenerations you will need to get a satisfying result. Sometimes regenerating is all it takes to land on something perfect, or at least better. Do not underestimate the power of a well-written prompt paired with a little patience.
Intermediate: Reference Images
A more advanced approach is to provide the AI model with reference materials alongside your text prompt. Not all models support this, but those that do will gladly accept one or more images together with your written description. This offers a major advantage over text alone: instead of struggling to describe a specific object in words, you can simply provide a reference image, and the AI will incorporate that exact object into the generated result.
The types of references you can use vary widely. A photograph of a real person can serve as a face or character reference. A product photo can ensure the AI renders a specific item accurately. A screenshot from a film or a painting can set the mood, color palette, or lighting style. Some models even accept depth maps or edge-detection images that define the structural composition without dictating the visual content. The more relevant information you give the AI, the less it has to guess—and the less it guesses, the closer the output will be to your vision.
One important consideration at this level is the quality of your reference images. Blurry, low-resolution, or poorly lit references will introduce noise into the generation process. The AI will try to replicate what it sees, so if the reference itself is flawed, those flaws tend to carry over. Clean, well-lit, high-resolution references consistently produce better results.
Having one or more carefully chosen reference images can dramatically simplify the process of creating the exact image you have in mind. For many use cases—character consistency across a series of illustrations, product mockups, or branded content—this level is where AI image generation starts to feel genuinely useful as a creative tool rather than a novelty.
Advanced: Full Composition Control
The next step up is to stop describing the composition to the AI and instead treat it as a tool for combining pre-prepared image elements. At the Intermediate level, you provide a reference or two and let the AI handle the rest of the scene. At the Advanced level, you prepare nearly every aspect of the image in advance and ask the AI to assemble it.
For example, if the image must feature a specific woman wearing a specific outfit in a specific setting and pose, you can achieve that by providing an image of the location, a character sheet of the woman, and even a rough sketch or stand-in reference for the pose. The AI will then combine all of these elements into a single, coherent image. You are no longer hoping the AI will interpret your words correctly—you are showing it exactly what you want.
One practical example: take a photo of a dollhouse with a doll positioned in the desired pose. Pair that with a character sheet showing the character from multiple angles and a style reference image that defines the visual aesthetic you are after. The AI will use the dollhouse photo for spatial composition and pose, the character sheet for the person’s appearance, and the style reference for the overall look and feel. The result is a highly controlled image that would have been nearly impossible to achieve through text prompting alone.
This level does require more preparation. You might need to create rough sketches, find or build physical mockups, or generate preliminary AI images that serve as compositional guides for the final generation. It can feel like a lot of work upfront, but the payoff is significant: instead of generating dozens of images and hoping one lands close to your vision, you often get what you need within just a few attempts. At this level, you are essentially art-directing the AI rather than just prompting it.
Professional: Post-Processing and Targeted Editing
The leap to professional-level image generation has less to do with prompting and more to do with the extra tools you bring to the table and how you use them. No matter how carefully you prepare your references and prompts, the AI will probably still get something wrong. A hand might look slightly off. The eyes might not quite match the reference. A background element might be distracting. At every level below this one, the default response to such problems is to regenerate and hope for a better roll of the dice. At the Professional level, you fix it.
With access to modern image editing software—Adobe Photoshop, GIMP, Affinity Photo, or even free browser-based editors—you can extract the problematic part of the image, process it through the AI separately, and then place the corrected piece back into the original. The workflow is straightforward: select the area that needs fixing, cut or copy it out, run it through an AI generation or inpainting pass with appropriate prompts and references, and composite the result back into the main image. Most editing software makes this kind of compositing quick and painless once you are familiar with the basics.
This approach does not just help with small clean-ups. It also enhances detail in a way that is unique to AI-assisted workflows. No matter how small the area you cut out for correction, the AI will return a relatively large result—most models output at a fixed resolution regardless of how little source material you provide. When you scale that corrected piece back down to fit the original image, the details will be noticeably sharper and more refined than what was there before. This makes the technique especially valuable for faces, hands, text, fine textures, and any other area where detail matters.
Professional-level workflows often involve multiple rounds of this process. You might fix the hands first, then the background, then make a final pass to adjust lighting consistency across the composited areas. Each round brings the image closer to perfection. It takes more time, but the result is an image that looks intentional and polished rather than “obviously AI.”
A Note on Reference Materials
Regardless of which level you are working at, the type of reference material you provide can make or break your results. A simple frontal photo of a subject is usually enough to place that person in a scene. But sometimes a person looks different from behind, or their outfit has distinct details visible only from certain angles. In those cases, a reference sheet showing the front, side, and back views—along with a close-up of the face—will help the AI maintain a much more consistent look across different compositions and camera angles.
The same principle applies to objects, environments, and even artistic styles. If you want the AI to faithfully reproduce a specific car, a single photo from one angle might lead to creative guesses about what the other side looks like. Multiple angles eliminate that guesswork. If you want a consistent illustration style across a series of images, providing several examples of that style—rather than just one—gives the AI a much clearer understanding of what you are after.
Think of reference materials as a visual vocabulary. The richer and more specific your visual vocabulary, the more precisely you can communicate with the AI. A single reference image is a sentence. A full reference sheet with multiple angles, a style guide, and a compositional sketch is an entire brief.
Final Thoughts
All that said, even professionals will occasionally scrap everything and regenerate from scratch in hopes of faster, better results. The simple approach is always valid. There is a reason every level of this framework includes the option to just try again—sometimes the AI surprises you, and a fresh generation gives you something better than hours of careful editing would have.
The real takeaway is not that you need to adopt the most complex workflow possible. It is that you should be aware of your options. If a quick text prompt gives you exactly what you need, that is a win. If it does not, you now know that reference images, full composition control, and targeted post-processing are all available to you—and each one brings you meaningfully closer to the image you originally envisioned.
In the end, as with any creative project, the more effort you put into an AI-generated image, the closer the result will be to what you had in mind. A quick prompt will give you something. But with more thought about the process and the outcome, you can get much closer to exactly what you need. The tools are there. The techniques are there. The only question is how far you want to take it. Start wherever you are comfortable, experiment with the next level when you are ready, and remember that every great AI image, no matter how polished the final result, started with someone simply typing a prompt and pressing generate.

