Automating Production Releases with AI Agents

I just completed a milestone I’m pretty happy with, so I thought I’d write a post describing it.

As part of my ongoing experiments using AI agents, I created two specialized agents that work together to automate an entire release process—from test verification through production deployment. The “QA Manager” agent ensures code quality, while the “Release Manager” agent orchestrates version control and deployment. (I generally find that it’s best to not create AI agents as anthropomorphized replacements for roles traditionally conducted by humans. In this case, these agents are doing a small set of focused tasks, and the names are really just about that narrow responsibility, and not intended to convey that they do everything that real people in those roles would do. Perhaps “deployment orchestrator” and “quality gate” or something similar would have been more appropriate.)

How They Work Together

The agents communicate asynchronously through context files—Markdown-formatted documents in a directory that serves as a shared workspace:

test-pass-{timestamp}.md - QA approval signal with test results
test-failures-{timestamp}.md - Test failure reports requiring attention
test-gaps-{timestamp}.md - Missing test coverage that needs addressing

These timestamped files allow agents to verify recency: the QA approval must be newer than all code changes before deployment proceeds.

The Automated Workflow

QA Manager: Quality Gate

Triggers: Code changes, bug fixes, or pre-release verification

Process:

Checks for existing test reports and compares timestamps with code changes
Analyzes recent commits to understand what behavior changed
Evaluates test coverage for new/modified code
Runs the full test suite
Generates one of three outcomes:

test-pass file if all tests pass with adequate coverage
test-failures file with detailed failure analysis
test-gaps file identifying missing test coverage

Output: A timestamped report that serves as the quality gate for releases

Release Manager: Deployment Orchestrator

Triggers: Explicit deployment request or after feature completion

Process:

Pre-flight

Verifies clean git state and correct branch
Checks QA approval - Reads most recent test-pass-{timestamp}.md
Validates timestamp - QA approval must be newer than all pending changes
If QA approval is stale, invokes QA Manager and waits for fresh approval

Release Creation

Analyzes commits to determine semantic version bump (MAJOR/MINOR/PATCH)
Updates context/release-number with new version
Generates config/version.yml with version, timestamp, and git SHA
Creates comprehensive CHANGELOG.md entry
Commits version changes
Creates annotated git tag
Pushes commits and tags to GitHub

Deployment

Executes deployment to production
Monitors deployment progress
Verifies application health in production

Key Benefits

Rigorous Quality Control—No code reaches production without passing tests and QA verification

Zero Human Error—Automated version bumping, tagging, and changelog generation

Full Traceability—Every release has a git tag, changelog entry, and version metadata

Agent Coordination—QA Manager acts as a hard gate; Release Manager cannot bypass it

Audit Trail—Context files document every decision and verification step

Example Flow

Developer finishes feature: git commit -m "Added version number to system info pane"

User requests deployment: > "Deploy this to production"

Release Manager checks QA approval: No recent test-pass file found

Release Manager invokes QA Manager

QA Manager runs test suite
All tests pass
Creates context/test-pass-20251013-234559.md

Release Manager proceeds

Determines version 4.2.0 (MINOR bump), or whatever is appropriate
Creates config/version.yml
Updates CHANGELOG.md
Commits and tags release
Deploys to production

The entire process—from code commit to production deployment—is now a coordinated dance between two AI agents, each expert in their small domain, ensuring quality and consistency at every step. Moreover, the main context window I’m using doesn’t get polluted with too much of the test running and deployment noise. I’ve also appreciated that the reports that the agents generate are genuinely useful: they detail the current quality and deployment state of the application over time.

Like I said at the top, I’m pretty happy with the way this is working, and I’m looking forward to iterating on this approach.