Here’s the friction point I kept hitting. I’d give Claude Code a task, check my phone for ten minutes, come back — and find it stopped mid-task, waiting on something like “should I use a singleton or factory pattern here?” The whole execution paused. Waiting for me.
I was spending more time managing the AI than I would have just writing the code myself. That’s the admin tax. And it compounds. You end up in this weird loop where the tool that’s supposed to save you time creates a new job: AI babysitter.
What chief-of-staff actually does
The system inverts the relationship. Instead of you supervising Claude Code, there’s a management layer between you and it. You submit a task, answer 3–5 upfront clarifying questions, then walk away. The system runs Claude Code headless, has a separate reviewer examine the actual workspace artifacts, and only escalates to you if something is genuinely stuck.
In practice: I gave it a task — “build a multi-source research aggregator with export to markdown.” It asked five questions: output format, source types, deduplication strategy, whether I wanted citations, file naming convention. I answered them. Forty minutes later, a Telegram notification: task complete, 9KB report, reviewer sign-off passed. I checked the output. It was correct.
The architecture is a LangGraph DAG. Each node is a stage — clarify, plan, execute, review. The reviewer is a separate LLM call that reads the actual workspace files, not Claude’s self-assessment. That distinction matters. Self-reported success is noise. Artifact-based review is the signal.
Safety isn’t just logging
Most AI tool safety layers log things and then let them through anyway. This one blocks. rm -rf / gets caught before execution. Write operations are constrained to task directories. Risky patterns — curl | bash, chmod 777, sudo on non-whitelisted commands — get flagged. An allowlist of approved tools (git, npm, pytest, ./gradlew) keeps the surface manageable.
The part I didn’t expect to matter: the learning layer
Each completed task extracts what worked and appends it to skills/global.md. The orchestrator reads this when building the next brief. After a few dozen tasks, the briefs get noticeably sharper — fewer wrong turns, better tool selection on the first attempt. It’s not magic. The orchestrator just has concrete examples of what succeeded in similar tasks.
Stack and setup
Claude Opus 4.7 via Databricks AI Gateway, LangGraph with a custom map-reduce dispatcher, FastAPI + vanilla JS for the web UI, PostgreSQL + pgvector for RAG, LangSmith for tracing. Runs in Docker — single docker-compose up -d after you fill in the .env.
I’ve run it against 40 tasks including Android APK generation and multi-step webapp + mobile integration work. The correction loop triggers about 30% of the time. Human escalation is under 10%.
Who this is for
If you’re using Claude Code daily and finding yourself constantly redirecting it mid-task, this is worth a look. The setup cost only makes sense if you’re delegating real work regularly — but if you are, the time saved is real.
MIT licensed. Repo at github.com/goyaljai/chief-of-staff. Issues and PRs welcome.
Frequently Asked Questions
chief-of-staff is an open-source management layer that runs Claude Code autonomously. You submit a task, it executes, an independent reviewer checks the output, and you only get interrupted when something genuinely requires a human decision.
Claude Code requires constant human supervision — it stops mid-task to ask questions. chief-of-staff handles those decisions autonomously using upfront clarification, correction loops, and artifact-based review. Human escalation drops to under 10% of tasks.
Claude Opus 4.7 via Databricks AI Gateway, LangGraph DAG executor, FastAPI + vanilla JS frontend, PostgreSQL + pgvector for RAG, and LangSmith for tracing. Runs in Docker with a single docker-compose up.
Yes — it blocks catastrophic commands like rm -rf /, constrains write operations to task directories, and maintains a whitelist of approved tools. Pre-mutation snapshots enable automatic undo if something goes wrong.

Leave a Reply