If an agent system cannot show what it was told and what it returned, you cannot review it. That is not a philosophical stance. It is a control question.
Building Effective AI Agents lists transparency in planning as a first-class principle. We apply that to execution: the plan, the per-agent system prompts, the objectives, and the raw model outputs are all inspectable, with synthesis presented as a first-class result alongside, not instead of, the source material.
For teams that run continuous evaluation, OpenAI's safety best-practices guide is a practical companion; it is not a replacement for your own test sets.
So what
Buyers should reject "our model reasons internally" as an answer. Internal reasoning is either surfaced or it is a gap in your audit story.