Optimize Then Generate — Plan, Evaluate, Improve Pipeline

Why "Optimize First" Matters

Most failed AI generations aren't model problems—they're prompt problems. A rough sentence like "nice product photo" leaves lighting, angle, background, and style up to chance. AI Chat turns vague intent into structured instructions that image and TTS models can execute reliably.

For overseas ecommerce and performance teams, the cost of skipping this step is concrete: burned credits on 4K re-rolls, Amazon mains that fail compliance checks, and Meta/TikTok ads that look “AI-random” instead of on-brand.

After optimization you typically get:

Clearer intent — subject, scene, channel, and goal are explicit
Style consistency — same brand look across dozens of SKUs or ad variants
Detail control — texture, lighting, and composition are named
Predictable outputs — fewer random failures and re-rolls
Credit efficiency — filter at 1K, finalize at 2K/4K only for winners

Before and After Example

Rough input (unstable):

Nice skincare bottle photo for ads

After AI Chat optimization (stable):

Single skincare serum bottle, centered hero product, pure white seamless background,
soft studio key light from upper left, subtle contact shadow beneath bottle,
glass texture with realistic reflections, label text sharp and readable,
premium DTC aesthetic, photorealistic, 4:5 aspect ratio for Instagram feed ad

The optimized version names subject placement, background, lighting direction, material, and constraints—so Nano Banana 2 and related models have less room to guess wrong.

Plan–Evaluate–Improve loop

Compress the five steps into three memorable actions:

Phase	In ForgeEcho	Output
Plan	AI Chat: brief + 3 structured variants	Production-ready prompt or script
Evaluate	Generate image/voice + rubric score	Know which dimension failed
Improve	Change one variable, regenerate	Repeatable change log

Teams often skip Evaluate and re-roll ten times. A short checklist (subject readable, color match, artifacts, voice pacing) separates prompt structure issues from model/resolution choices—saving credits on blind retries.

The same loop applies to conversational editing: reference upload → first pass "lighting only" → evaluate → second pass "subtle skin"—never stack every retouch term in one prompt.

The Core Pipeline (5 Steps)

1. Draft a rough prompt

Write what you want in plain language. Don't worry about structure yet.

Example: "Skincare bottle on a clean background, looks premium, for Instagram ad"

2. Refine in AI Chat

Paste the rough idea and ask for structured variants. Request 3 style directions—for example: minimal studio, lifestyle natural light, and bold campaign color.

Compare variants for:

Visual clarity (is the product the hero?)
Brand fit (colors, mood, premium vs playful)
Keyword conflicts (avoid mixing "photorealistic" and "flat illustration" in one prompt)

For voice scripts, ask AI Chat to shorten sentences and add a clear hook + CTA.

Each chat reply costs 0.5 credits and saves to Prompt Library.

3. Generate image or voice

Pick one variant and send it to AI Image or paste a polished script into AI Voice.

AI Image credits by resolution: 1K = 3, 2K = 4, 4K = 8 credits per generation.

AI Voice: 1 credit per 500 characters (minimum 500).

For voice, generate a short sample first (10–20 seconds) to validate tone and pacing before the full read.

4. Compare and iterate

Score outputs on a simple rubric:

Criterion	Pass?
Subject readable at thumbnail size (image)
Colors match brand or product (image)
No unwanted artifacts or distortion (image)
Natural pacing and clear pronunciation (voice)

Adjust one variable at a time—lighting, background, voice choice, or script length—not everything at once.

5. Save as reusable template

Store the winning prompt or script with metadata:

Use case (listing, ad, social cover, voiceover)
Aspect ratio (1:1, 4:5, 9:16) for images
Model notes (nano-banana-2 for quality, nano-banana-fast for drafts)

Next time you only swap the product name or scene detail.

Model Selection Quick Reference

Goal	Suggested model	Notes
Fast drafts and iteration	`nano-banana-fast`	Lower cost, good for exploring
Production ecommerce / ads	`nano-banana-2`	Supports 1K/2K/4K; best balance
High-res campaign assets	`nano-banana-2-4k-cl` or `nano-banana-pro`	4K when platform requires it
Reference-based edit	Any + upload reference	JPG/PNG/WebP, max 3MB

When to Chat First vs Refine Existing Prompt

Situation	Start with
You know the goal but not the words	AI Chat → structured variants
You have a working prompt that drifted	Chat: "keep structure, fix [one issue]"
New campaign, unclear direction	Chat to explore 2–3 moods → pick one
Batch production from templates	Skip exploration; swap SKU fields only

See AI Chat Guide for optimization request templates.

Production SOP for agencies and in-house teams

Use this when more than one person touches creatives (freelancer + brand, or a small growth pod):

Brief in one sentence — channel (Amazon main / Meta 4:5 / TikTok 9:16), deliverable count, hard constraints (white BG, no props, preserve label).
Chat owner drafts 3 structured variants; mark one as primary.
Image owner runs 1K screen (4–6 variants) → scores with the rubric above → promotes 1–2 to 2K.
Voice owner (if needed) samples 10–20s before full TTS.
Librarian tags the winner in Prompt Library: channel + SKU family + model + ratio.

Skipping step 5 is how brand look drifts by week three. The Prompt Library is the system of record—not a Slack screenshot of a Discord prompt.

Credit math: why chat-first usually wins

Assume a new Meta 4:5 ad brief with no library template yet:

Path	Rough spend	Outcome risk
Jump straight to 2K × 6	6 × 4 = 24 credits	High—structure still wrong
Chat × 3 + 1K × 4 + 2K × 2	1.5 + 12 + 8 = ~21.5 credits	Lower—filter before finals

The chat-first path is rarely more expensive and usually cheaper once you count the re-rolls you didn’t need. For SKU batches with a gold template, skip exploration and only swap fields (see Ecommerce Image Optimization).

Team Workflow: Shared Prompt Library

For ecommerce, UGC ads, or social teams, one shared library beats everyone prompting from scratch:

Category	Template fields
Product visuals	SKU, angle, background, lighting, "keep label readable"
Social posts	Platform, hook mood, CTA tone, safe area for text overlay
Voice ads	Duration, hook line, benefit bullets, CTA, preferred voice

Review templates monthly. Retire prompts that consistently underperform in CTR or conversion tests.

End-to-end case: brief to ad-ready assets (90 min)

Scenario: Wireless earbuds TikTok 9:16 ad—cover still + 20-second voice read.

Timeline
09:00  AI Chat — brief + 3 visual directions + 2 hook scripts        (~15 min, 2 credits)
09:15  AI Image — fast 1K × 6 cover filter                             (~20 min, 18 credits)
09:35  Checklist score — pick "UGC handheld + window light"             (~5 min)
09:40  AI Image — nano-banana-2 2K finals × 2                          (~10 min, 8 credits)
09:50  AI Chat — 40-word script, spoken phrasing                        (~5 min, 0.5 credits)
09:55  AI Voice — 15s samples × 2 voices                               (~10 min, 2 credits)
10:05  Editor — still + VO + captions                                  (~25 min)

Evaluate log (fill every run):

Check	Cover	Voice
Subject readable in 3s	✓ / ✗	—
Product/brand name correct	✓ / ✗	✓ / ✗
No garbled text	✓ / ✗	—
Hook in first 2s	—	✓ / ✗
Natural pacing	—	✓ / ✗

Failed rows map to one lever only—do not change prompt, model, and aspect ratio at once.

Common Mistakes

Skipping chat refinement on "simple" product shots—background and lighting still vary wildly
Changing too many keywords between iterations—you won't know what fixed the output
Ignoring aspect ratio until export—compose for 9:16 or 1:1 from the prompt stage
Long voice scripts on first try—validate tone on a short clip before the full read

FAQ

Does optimization work for AI Voice too?
Yes. The same structured hook + benefit + CTA pattern applies to ad reads, explainers, and social voiceovers.

How many variants should I generate?
Three optimized variants plus 1–2 manual tweaks is enough for most decisions. More than five slows you down without better results.

Can I reuse one prompt across models?
Use the same structure; adjust resolution and model-specific quality terms as needed.