I just completed a milestone I'm pretty happy with, so I thought I'd write a post describing it.
As part of my ongoing experiments using AI agents, I created two specialized agents that work together to automate an entire release process---from test verification through production deployment. The "QA Manager" agent ensures code quality, while the "Release Manager" agent orchestrates version control and deployment. (I generally find that it's best to not create AI agents as anthropomorphized replacements for roles traditionally conducted by humans. In this case, these agents are doing a small set of focused tasks, and the names are really just about that narrow responsibility, and not intended to convey that they do everything that real people in those roles would do. Perhaps "deployment orchestrator" and "quality gate" or something similar would have been more appropriate.)
How They Work Together
The agents communicate asynchronously through context files---Markdown-formatted documents in a directory that serves as a shared workspace:
test-pass-{timestamp}.md- QA approval signal with test resultstest-failures-{timestamp}.md- Test failure reports requiring attentiontest-gaps-{timestamp}.md- Missing test coverage that needs addressing
These timestamped files allow agents to verify recency: the QA approval must be newer than all code changes before deployment proceeds.
The Automated Workflow
QA Manager: Quality Gate
Triggers: Code changes, bug fixes, or pre-release verification
Process:
- Checks for existing test reports and compares timestamps with code changes
- Analyzes recent commits to understand what behavior changed
- Evaluates test coverage for new/modified code
- Runs the full test suite
- Generates one of three outcomes:
- test-pass file if all tests pass with adequate coverage
- test-failures file with detailed failure analysis
- test-gaps file identifying missing test coverage
Output: A timestamped report that serves as the quality gate for releases
Release Manager: Deployment Orchestrator
Triggers: Explicit deployment request or after feature completion
Process:
Pre-flight
- Verifies clean git state and correct branch
- Checks QA approval - Reads most recent
test-pass-{timestamp}.md - Validates timestamp - QA approval must be newer than all pending changes
- If QA approval is stale, invokes QA Manager and waits for fresh approval
Release Creation
- Analyzes commits to determine semantic version bump (MAJOR/MINOR/PATCH)
- Updates
context/release-numberwith new version - Generates
config/version.ymlwith version, timestamp, and git SHA - Creates comprehensive
CHANGELOG.mdentry - Commits version changes
- Creates annotated git tag
- Pushes commits and tags to GitHub
Deployment
- Executes deployment to production
- Monitors deployment progress
- Verifies application health in production
Key Benefits
Rigorous Quality Control—No code reaches production without passing tests and QA verification
Zero Human Error—Automated version bumping, tagging, and changelog generation
Full Traceability—Every release has a git tag, changelog entry, and version metadata
Agent Coordination—QA Manager acts as a hard gate; Release Manager cannot bypass it
Audit Trail—Context files document every decision and verification step
Example Flow
Developer finishes feature:git commit -m "Added version number to system info pane"
User requests deployment:> "Deploy this to production"
Release Manager checks QA approval:No recent test-pass file found
Release Manager invokes QA Manager
- QA Manager runs test suite
- All tests pass
- Creates
context/test-pass-20251013-234559.md
Release Manager proceeds
- Determines version 4.2.0 (MINOR bump), or whatever is appropriate
- Creates
config/version.yml - Updates
CHANGELOG.md - Commits and tags release
- Deploys to production
The entire process---from code commit to production deployment---is now a coordinated dance between two AI agents, each expert in their small domain, ensuring quality and consistency at every step. Moreover, the main context window I'm using doesn't get polluted with too much of the test running and deployment noise. I've also appreciated that the reports that the agents generate are genuinely useful: they detail the current quality and deployment state of the application over time.
Like I said at the top, I'm pretty happy with the way this is working, and I'm looking forward to iterating on this approach.