Character Iteration with AI: Bringing Precision to AI-Driven Art

This R&D Project from Keywords Studios’ AI Centre of Excellence redefines AI as a controlled tool for artists.

Author: Stephen Peacock, Head of Games AI at Keywords Studios

Date Published: 20/11/2024

Art Comes From Control

Consider Jackson Pollock’s famous drip paintings. To the casual observer, they might appear random – just paint splattered on canvas. A child could do that, right? But this misses the fundamental nature of art: it’s not just about the medium, but about control and intentionality. Pollock’s mastery wasn’t in the mere act of dripping paint; it was in his precise control over how, where, and when each drop fell. Through this control, he could communicate specific artistic intentions, turning what could have been mere chaos into compelling artistic statements.

This principle of control becomes particularly relevant when we consider AI-assisted art creation. Many current AI image generators operate as “black boxes” – you provide input and hope for useful output. While this can produce striking images, it’s more akin to random paint splatter than intentional art creation. Our R&D focuses on transforming AI from an unpredictable generator into a precise tool that responds to artistic intent. Just as Pollock achieved mastery through controlling his medium, artists using our workflow maintain complete control over the creative process, using AI as an instrument to bring their vision to life rather than being passengers in the generation process.

Understanding Multimodal Chain-of-Thought

At the heart of this R&D is the concept of Multimodal Chain-of-Thought (CoT) reasoning—an AI workflow that mirrors human problem-solving processes. Rather than generating images in a single step (one-shot), CoT systems break down complex tasks into logical sequences. By incorporating multiple types of input—visual, textual, and spatial—our system can understand not just what an artist wants to create, but how each element contributes to the final vision.

This multimodal approach allows the system to process:

Visual information from 3D models and reference images
Textual descriptions at both global and segment-specific levels
Spatial relationships and proportions from the 3D template

Technical Implementation

The prototype leverages Comfy UI, an open-source node-based interface for AI image generation, as its workflow prototyping platform. Technical artists familiar with node-based tools like Houdini or Unreal’s Blueprint system will find this approach intuitive. Each node in the graph represents a step in the reasoning chain, allowing artists to:

Prototype and iterate on multimodal workflows visually
Inject different types of input at any point in the process
Create reusable workflow templates
Maintain precise control over each step

Key Benefits for Artists

This R&D prototype demonstrates several promising advantages for game production:

Intuitive Control: Artists work in their natural visual language rather than wrestling with complex text prompts
Rapid Iteration: Quick refinements and adjustments without starting from scratch
Precise Modifications: Fine-grained control over specific character elements
Style Consistency: Automatic compliance with style guides and design parameters
Seamless Integration: Works alongside familiar 3D tools and existing pipelines
Design Preservation: Ability to capture and maintain successful elements through 3D model updates

Beyond Character Generation

While our R&D prototype focused on character design as an initial use case, the underlying multimodal chain-of-thought workflow has broad applications across game development. Similar principles of controlled iteration and segmented refinement could be applied to:

Weapon Design: Iterating on models of guns, swords, or magical items while maintaining specific game-balance proportions
Environmental Assets: Developing buildings, vegetation, or props with consistent architectural or natural styles
Level Design: Rapidly prototyping environmental layouts while preserving gameplay spaces and sight lines
Vehicle Design: Creating variations of vehicles while maintaining specific technical or mechanical constraints
UI/UX Elements: Generating interface components that adhere to specific style guides and usability requirements

This versatility suggests that the workflow could be adapted to support nearly any visual aspect of game development where iterative refinement and precise control are essential.

Future R&D Directions

The Keywords AI Center of Excellence is continuing to explore ways to make this workflow more accessible to artists. Future R&D will focus on:

Simplifying the workflow to reduce technical complexity
Integration with industry-standard 3D creation tools like Maya or Blender
Streamlining the process of segment identification and prompt management
Creating interfaces for artists not be familiar with node-based workflows

Character Iteration with AI: Bringing Precision to AI-Driven Art

This R&D Project from Keywords Studios’ AI Centre of Excellence redefines AI as a controlled tool for artists.

Art Comes From Control

Understanding Multimodal Chain-of-Thought

Technical Implementation

The Prototype Workflow

The Dynamic Feedback Loop

Key Benefits for Artists

Beyond Character Generation

Future R&D Directions

Innovation at Keywords Studios

Contact Us Today

Contact us