Model providers fail. They rate-limit, they return 5xx, and they version models on their schedule. A serious deployment cannot treat one vendor as a single point of failure for every subtask.

We run Anthropic and OpenAI in the same system. A coordinator can stay on a stable model for structured output while sub-agents switch providers when a call fails. The behavior is documented for OpenAI and for Anthropic the same way any integration team expects: credentials, base URLs, error classes.

When the discussion turns to why to combine retrieval with generation at all, the classic reference is Lewis et al. (2020): retrieval-augmented generation. It is not the same as failover, but it is the same class of "do not bet the business on a single static context window" thinking.

So what

Ask vendors how failure scopes. Global failover hides which customers paid for which capacity. Per-agent retry and alternate routes keep one bad shard from blanking a whole report.

Multi-provider routing and per-agent failover