Character Iteration with AI: Bringing Precision to AI-Driven Art
This R&D Project from Keywords Studios’ AI Centre of Excellence redefines AI as a controlled tool for artists.
Art Comes From Control
Consider Jackson Pollock’s famous drip paintings. To the casual observer, they might appear random – just paint splattered on canvas. A child could do that, right? But this misses the fundamental nature of art: it’s not just about the medium, but about control and intentionality. Pollock’s mastery wasn’t in the mere act of dripping paint; it was in his precise control over how, where, and when each drop fell. Through this control, he could communicate specific artistic intentions, turning what could have been mere chaos into compelling artistic statements.
This principle of control becomes particularly relevant when we consider AI-assisted art creation. Many current AI image generators operate as “black boxes” – you provide input and hope for useful output. While this can produce striking images, it’s more akin to random paint splatter than intentional art creation. Our R&D focuses on transforming AI from an unpredictable generator into a precise tool that responds to artistic intent. Just as Pollock achieved mastery through controlling his medium, artists using our workflow maintain complete control over the creative process, using AI as an instrument to bring their vision to life rather than being passengers in the generation process.
Understanding Multimodal Chain-of-Thought
At the heart of this R&D is the concept of Multimodal Chain-of-Thought (CoT) reasoning—an AI workflow that mirrors human problem-solving processes. Rather than generating images in a single step (one-shot), CoT systems break down complex tasks into logical sequences. By incorporating multiple types of input—visual, textual, and spatial—our system can understand not just what an artist wants to create, but how each element contributes to the final vision.
This multimodal approach allows the system to process:
- Visual information from 3D models and reference images
- Textual descriptions at both global and segment-specific levels
- Spatial relationships and proportions from the 3D template
Technical Implementation
The prototype leverages Comfy UI, an open-source node-based interface for AI image generation, as its workflow prototyping platform. Technical artists familiar with node-based tools like Houdini or Unreal’s Blueprint system will find this approach intuitive. Each node in the graph represents a step in the reasoning chain, allowing artists to:
- Prototype and iterate on multimodal workflows visually
- Inject different types of input at any point in the process
- Create reusable workflow templates
- Maintain precise control over each step
The Prototype Workflow
Our pipeline employs a multimodal workflow that combines visual, spatial, and textual inputs in a clear, iterative process:
- Initial 3D Template: Artists begin with a basic 3D character model, providing a foundation for the design process.
- Segment Identification: They then identify key segments of the character (face, clothing, accessories, etc.), creating distinct zones for targeted refinement.
- Global Prompt: They set the overall creative direction through broad prompts (e.g., “rodeo cowboy”), establishing the character’s core identity.
- Segmented Prompts: Each identified segment can be individually refined through specific prompts (e.g., “blue jeans with rip on left knee”), allowing for precise control over every aspect of the design.
- Face Likeness Matching: For characters requiring specific facial features or resemblances, the system can incorporate reference images to guide the generation process.
- Iterative Output: The system generates images that artists can further refine, maintaining creative momentum while preserving artistic intent throughout the process.
The Dynamic Feedback Loop
One of the most powerful aspects of this workflow is its ability to capture and “lock in” successful AI-generated elements. When the AI produces compelling design elements—such as an ornate belt buckle that perfectly matches the character’s theme—artists can immediately incorporate these elements back into the 3D model. This creates a dynamic feedback loop:
- Discovery: The AI generates unexpected but appealing design elements
- Capture: Artists add these elements to the 3D model, precisely controlling their size and position
- Segmentation: The new elements become their own segments with dedicated prompts
- Refinement: Future iterations maintain these locked-in elements while continuing to explore other aspects
This iterative cycle ensures that happy accidents become intentional design choices, preserved and refined throughout the character development process.
Key Benefits for Artists
This R&D prototype demonstrates several promising advantages for game production:
- Intuitive Control: Artists work in their natural visual language rather than wrestling with complex text prompts
- Rapid Iteration: Quick refinements and adjustments without starting from scratch
- Precise Modifications: Fine-grained control over specific character elements
- Style Consistency: Automatic compliance with style guides and design parameters
- Seamless Integration: Works alongside familiar 3D tools and existing pipelines
- Design Preservation: Ability to capture and maintain successful elements through 3D model updates
Beyond Character Generation
While our R&D prototype focused on character design as an initial use case, the underlying multimodal chain-of-thought workflow has broad applications across game development. Similar principles of controlled iteration and segmented refinement could be applied to:
- Weapon Design: Iterating on models of guns, swords, or magical items while maintaining specific game-balance proportions
- Environmental Assets: Developing buildings, vegetation, or props with consistent architectural or natural styles
- Level Design: Rapidly prototyping environmental layouts while preserving gameplay spaces and sight lines
- Vehicle Design: Creating variations of vehicles while maintaining specific technical or mechanical constraints
- UI/UX Elements: Generating interface components that adhere to specific style guides and usability requirements
This versatility suggests that the workflow could be adapted to support nearly any visual aspect of game development where iterative refinement and precise control are essential.
Future R&D Directions
The Keywords AI Center of Excellence is continuing to explore ways to make this workflow more accessible to artists. Future R&D will focus on:
- Simplifying the workflow to reduce technical complexity
- Integration with industry-standard 3D creation tools like Maya or Blender
- Streamlining the process of segment identification and prompt management
- Creating interfaces for artists not be familiar with node-based workflows