Automating Production Releases with AI Agents

I just completed a milestone I'm pretty happy with, so I thought I'd write a post describing it.

As part of my ongoing experiments using AI agents, I created two specialized agents that work together to automate an entire release process---from test verification through production deployment. The "QA Manager" agent ensures code quality, while the "Release Manager" agent orchestrates version control and deployment. (I generally find that it's best to not create AI agents as anthropomorphized replacements for roles traditionally conducted by humans. In this case, these agents are doing a small set of focused tasks, and the names are really just about that narrow responsibility, and not intended to convey that they do everything that real people in those roles would do. Perhaps "deployment orchestrator" and "quality gate" or something similar would have been more appropriate.)

How They Work Together

The agents communicate asynchronously through context files---Markdown-formatted documents in a directory that serves as a shared workspace:

  • test-pass-{timestamp}.md - QA approval signal with test results
  • test-failures-{timestamp}.md - Test failure reports requiring attention
  • test-gaps-{timestamp}.md - Missing test coverage that needs addressing

These timestamped files allow agents to verify recency: the QA approval must be newer than all code changes before deployment proceeds.

The Automated Workflow

QA Manager: Quality Gate

Triggers: Code changes, bug fixes, or pre-release verification

Process:

  1. Checks for existing test reports and compares timestamps with code changes
  2. Analyzes recent commits to understand what behavior changed
  3. Evaluates test coverage for new/modified code
  4. Runs the full test suite
  5. Generates one of three outcomes:
  • test-pass file if all tests pass with adequate coverage
  • test-failures file with detailed failure analysis
  • test-gaps file identifying missing test coverage

Output: A timestamped report that serves as the quality gate for releases

Release Manager: Deployment Orchestrator

Triggers: Explicit deployment request or after feature completion

Process:

Pre-flight

  1. Verifies clean git state and correct branch
  2. Checks QA approval - Reads most recent test-pass-{timestamp}.md
  3. Validates timestamp - QA approval must be newer than all pending changes
  4. If QA approval is stale, invokes QA Manager and waits for fresh approval

Release Creation

  1. Analyzes commits to determine semantic version bump (MAJOR/MINOR/PATCH)
  2. Updates context/release-number with new version
  3. Generates config/version.yml with version, timestamp, and git SHA
  4. Creates comprehensive CHANGELOG.md entry
  5. Commits version changes
  6. Creates annotated git tag
  7. Pushes commits and tags to GitHub

Deployment

  1. Executes deployment to production
  2. Monitors deployment progress
  3. Verifies application health in production

Key Benefits

Rigorous Quality Control—No code reaches production without passing tests and QA verification

Zero Human Error—Automated version bumping, tagging, and changelog generation

Full Traceability—Every release has a git tag, changelog entry, and version metadata

Agent Coordination—QA Manager acts as a hard gate; Release Manager cannot bypass it

Audit Trail—Context files document every decision and verification step

Example Flow

Developer finishes feature:git commit -m "Added version number to system info pane"

User requests deployment:> "Deploy this to production"

Release Manager checks QA approval:No recent test-pass file found

Release Manager invokes QA Manager

  • QA Manager runs test suite
  • All tests pass
  • Creates context/test-pass-20251013-234559.md

Release Manager proceeds

  • Determines version 4.2.0 (MINOR bump), or whatever is appropriate
  • Creates config/version.yml
  • Updates CHANGELOG.md
  • Commits and tags release
  • Deploys to production

The entire process---from code commit to production deployment---is now a coordinated dance between two AI agents, each expert in their small domain, ensuring quality and consistency at every step. Moreover, the main context window I'm using doesn't get polluted with too much of the test running and deployment noise. I've also appreciated that the reports that the agents generate are genuinely useful: they detail the current quality and deployment state of the application over time.

Like I said at the top, I'm pretty happy with the way this is working, and I'm looking forward to iterating on this approach.