Which image models actually take 6 reference inputs? We ran the benchmark.

Modern image models can take multiple reference images plus one text prompt and compose them into a single output. One prompt, N references, one composite. Useful for visual novels, brand collateral with consistent characters, product mockups, comic panels, and any workflow that needs a scene assembled from known parts.

The catch: every model in the marketplace claims to do this. Most do not. Some cap at 1 reference, some at 4, some accept the inputs but silently ignore everything past the first. And on top of that the same model behaves differently across resellers because of quotas, slug variants, and missing endpoint wiring. So we ran a benchmark.

The fixture set

Six fixed JPEGs covering a typical roleplay scene composition: one tavern background and five character portraits (one user-side and four NPCs). Same six files for every (provider, model) pair, no per-channel rewriting. Total payload roughly 500 KB across the six images.

The text prompt is fixed too. It names each character, references each image by index, and asks for a single composite. Verbatim:

text

Compose a single anime-style illustration combining the six reference images: place Sara, the blonde girl with the side braid (image 01), inside the tavern (image 00), interacting with four NPCs - the blonde male hero Trevor (image 02), the bearded ranger Puck (image 03), the bald knight in gold armor (image 04), and the brunette adventurer woman (image 05). Preserve each character's distinctive appearance. Single output image.

How the benchmark runs

For every (provider, model) pair in the catalog: POST the six fixtures plus the prompt to that channel's /v1/images/edits. Pass = HTTP 200 with a non-empty image URL or base64 payload in the response. Fail = non-200, empty body, or shape mismatch. No human grading. The benchmark is reproducible, runs on demand, and re-runs whenever a new image model appears upstream.

We do not score visual quality here. This run answers one question only: does the model accept six reference inputs plus a prompt and return an image, on this specific channel, right now? Quality grading is a separate pass.

What 332 channel runs turned up

Across 8 upstream resellers, we tested 136 unique image models over 332 channel runs. 54 models have at least one verified passing provider. Run captured 2026-05-09.

Grouped by family. Verified models = distinct SKUs in that family with at least one passing channel. Passing-provider sum = total count of (model, provider) pairs that returned 200 across the family.

Family	Verified models	Passing-provider sum
gpt-image-*	6	26
gemini-*-image	3	22
doubao-seedream-*	3	6
flux-*	7	8
qwen-image-edit-*	2	5
wan2.5-i2i	1	2

Top single-model winners by passing-provider count. More passing providers means better routing headroom: when one upstream rate-limits or goes down, the router has another path to the same model.

Model	Passing providers
gemini-3.1-flash-image-preview	8
gpt-image-1	7
gemini-3-pro-image-preview	7
gemini-2.5-flash-image	7
gpt-image-2	6
gpt-image-1-mini	4
gpt-image-1.5	4
flux-schnell	3
qwen-image-edit-plus	3

Why a model passes on one reseller and fails on another

Three common reasons. Quota exhaustion: the reseller's upstream key burned its image quota for the day and now answers 429. Slug variants: the same underlying model is exposed as gpt-image-2, gpt-image-2-all, gpt-image-2-c, and gpt-image-2-vip across different resellers, and only some of those slugs are actually wired to a working backend. Endpoint mismatch: a few resellers expose the slug but never plumbed /v1/images/edits, so the request 404s.

This is why we test continuously and route around failing channels at request time. A model that passes today can start 429-ing tomorrow if its reseller's upstream rotates. Static availability lists go stale fast in this corner of the market.

maxImageInputs is now in every model's metadata

Every image model in the catalog now carries a maxImageInputs field on its metadata block. Models that passed the 6-reference benchmark are tagged maxImageInputs: 6. The same shape is used by the catalog UI, the API, and the routing layer.

json

{
  "model": "gemini-3.1-flash-image-preview",
  "metadata": {
    "maxImageInputs": 6
  }
}

If you only care about which models will accept your 6-image payload, filter on this field. New models added later get the same tag once they pass the same benchmark.

Try it

Every model above is available through one OpenAI-compatible endpoint. Multi-reference image edit is exposed exactly as the upstream defines it, no extra wrapping. Bring six images and a prompt; the router picks a working provider.

Grab an API key or browse the image catalog to see the full verified list.