I Built an AI That Manages Itself — So I Don’t Have To

Published May 29, 2026 · 4 min read

TL;DR: chief-of-staff is an open-source management layer for Claude Code. You submit a task, it runs autonomously, an independent reviewer checks the output, and you only get interrupted if something genuinely needs a human decision. github.com/goyaljai/chief-of-staff

Here’s the friction point I kept hitting. I’d give Claude Code a task, check my phone for ten minutes, come back — and find it stopped mid-task, waiting on something like “should I use a singleton or factory pattern here?” The whole execution paused. Waiting for me.

I was spending more time managing the AI than I would have just writing the code myself. That’s the admin tax. And it compounds. You end up in this weird loop where the tool that’s supposed to save you time creates a new job: AI babysitter.

What chief-of-staff actually does

The system inverts the relationship. Instead of you supervising Claude Code, there’s a management layer between you and it. You submit a task, answer 3–5 upfront clarifying questions, then walk away. The system runs Claude Code headless, has a separate reviewer examine the actual workspace artifacts, and only escalates to you if something is genuinely stuck.

In practice: I gave it a task — “build a multi-source research aggregator with export to markdown.” It asked five questions: output format, source types, deduplication strategy, whether I wanted citations, file naming convention. I answered them. Forty minutes later, a Telegram notification: task complete, 9KB report, reviewer sign-off passed. I checked the output. It was correct.

The architecture is a LangGraph DAG. Each node is a stage — clarify, plan, execute, review. The reviewer is a separate LLM call that reads the actual workspace files, not Claude’s self-assessment. That distinction matters. Self-reported success is noise. Artifact-based review is the signal.

Safety isn’t just logging

Most AI tool safety layers log things and then let them through anyway. This one blocks. rm -rf / gets caught before execution. Write operations are constrained to task directories. Risky patterns — curl | bash, chmod 777, sudo on non-whitelisted commands — get flagged. An allowlist of approved tools (git, npm, pytest, ./gradlew) keeps the surface manageable.

The part I didn’t expect to matter: the learning layer

Each completed task extracts what worked and appends it to skills/global.md. The orchestrator reads this when building the next brief. After a few dozen tasks, the briefs get noticeably sharper — fewer wrong turns, better tool selection on the first attempt. It’s not magic. The orchestrator just has concrete examples of what succeeded in similar tasks.

Stack and setup

Claude Opus 4.7 via Databricks AI Gateway, LangGraph with a custom map-reduce dispatcher, FastAPI + vanilla JS for the web UI, PostgreSQL + pgvector for RAG, LangSmith for tracing. Runs in Docker — single docker-compose up -d after you fill in the .env.

I’ve run it against 40 tasks including Android APK generation and multi-step webapp + mobile integration work. The correction loop triggers about 30% of the time. Human escalation is under 10%.

Who this is for

If you’re using Claude Code daily and finding yourself constantly redirecting it mid-task, this is worth a look. The setup cost only makes sense if you’re delegating real work regularly — but if you are, the time saved is real.

MIT licensed. Repo at github.com/goyaljai/chief-of-staff. Issues and PRs welcome.

Frequently Asked Questions

What is chief-of-staff for Claude Code?

chief-of-staff is an open-source management layer that runs Claude Code autonomously. You submit a task, it executes, an independent reviewer checks the output, and you only get interrupted when something genuinely requires a human decision.

How is chief-of-staff different from just using Claude Code directly?

Claude Code requires constant human supervision — it stops mid-task to ask questions. chief-of-staff handles those decisions autonomously using upfront clarification, correction loops, and artifact-based review. Human escalation drops to under 10% of tasks.

What tech stack does chief-of-staff use?

Claude Opus 4.7 via Databricks AI Gateway, LangGraph DAG executor, FastAPI + vanilla JS frontend, PostgreSQL + pgvector for RAG, and LangSmith for tracing. Runs in Docker with a single docker-compose up.

Is chief-of-staff safe to run on my codebase?

Yes — it blocks catastrophic commands like rm -rf /, constrains write operations to task directories, and maintains a whitelist of approved tools. Pre-mutation snapshots enable automatic undo if something goes wrong.

I Built an AI That Manages Itself — So I Don’t Have To

What chief-of-staff actually does

Safety isn’t just logging

The part I didn’t expect to matter: the learning layer

Stack and setup

Who this is for

Frequently Asked Questions

Comments

Leave a Reply Cancel reply

More posts

Ronaldo’s 2026 World Cup Last Dance: Portugal Can Carry CR7, But Mbappé and Yamal Will Hunt the Weak Spot

RCB final mein RR chahiye, GT nahi: Kohli ki trophy dream ke raste ka asli villain Gill ki team hai

Gill’s New Chandigarh Trap: Why GT vs RR Qualifier 2 Could Be Decided by 12 Balls of Rabada Heat

Samsung Galaxy S26 vs iPhone 17: The Smarter 2026 Upgrade for Most Buyers