I’ve built the thing that builds the thing
I wanted a personal agent that was actually useful day to day — not a demo chatbot, not a thin wrapper around a single model, and not an expensive always-on research toy.
The goal was more practical: an assistant that could live close to my tools, remember preferences, run scheduled workflows, inspect code, help with documents, and still stay cheap enough to treat as personal infrastructure.
That system now runs on OpenClaw, hosted on a small Azure VM, with GitHub Copilot-backed model access as the default route and a few stricter lanes for specialized work.
Codebase: github.com/martysai/openclaw-selfhost-template — the sanitized Azure + OpenClaw template extracted from this setup.
Why self-host a personal agent?
The interesting part of personal agents is not that they can answer questions. A browser tab can already do that.
The interesting part is whether they can become reliable infrastructure:
- keep enough continuity to understand ongoing projects;
- use local files and repositories without uploading everything everywhere;
- run scheduled jobs without a human sitting in the loop;
- recover from provider failures and long-running tasks;
- respect hard boundaries around external actions;
- be cheap enough to leave running.
This pushes the design away from “one model with many tools” and toward a small operating environment: runtime, scheduler, memory, secrets, source control, policies, and observability.
The architecture
At a high level, the setup looks like this:
I chose Azure mostly because I wanted a boring VM that I could understand. No complicated platform story: just Linux, systemd, SSH, a small number of services, and explicit configuration files.
The public template reflects that bias. It is not a generic “deploy agents anywhere” framework; it is specifically an Azure + OpenClaw self-hosting template.
Speech: ASR, TTS, and SvetlanaClaw
Voice is one of the places where the setup became more personal.
I use Azure Speech for both directions:
- ASR: voice notes are transcribed through Azure Speech rather than a local Whisper-style model on the VM.
- TTS: when I explicitly ask for a voice reply, OpenClaw can synthesize a short audio response through Azure Speech.
This was partly a resource decision. A small VM is not the right place to run heavy speech models all the time. Offloading speech to Azure keeps the box responsive and makes the behavior more predictable.
It was also a language decision. I am a native Russian speaker, so Russian speech support matters to me. Azure’s Russian neural TTS catalog includes a voice called ru-RU-SvetlanaNeural; that name was too perfect to ignore. The assistant’s working name became SvetlanaClaw: OpenClaw as the runtime, Svetlana as the Russian-speaking voice/personality layered on top.
The important constraint is the same as everywhere else in the system: voice is explicit, not automatic. Text remains the default. Audio is generated only when I ask for it, and long answers should become short spoken summaries rather than full monologues.
Routing beats one giant model
A useful personal agent should not treat every task as the same kind of model call.
For ordinary conversation, coding help, repository inspection, small edits, and operational work, GitHub Copilot-backed model access is a good default. It is already part of my development workflow, so using it as the main lane keeps the marginal cost predictable.
But not every task belongs there. Some workflows need different privacy boundaries, different context, or more expensive long-form synthesis. The design I prefer is explicit routing:
| Workflow | Route |
|---|---|
| Normal chat, repo work, operational tasks | Default GitHub Copilot-backed model lane |
| General fallback | A secondary model lane |
| Personal cloud documents | Read-only, domain-specific route where possible |
| Long-form research digests | Scarce specialist lane, used intentionally |
| External writes | Human approval gate |
The important part is that fallbacks are not silent. If a task crosses a boundary — for example, from local files into an external model, or from drafting into sending — the system should make that visible.
Safety rails that matter in practice
Most of the work in a personal agent is not glamorous. It is deciding what the agent must not do by default.
The rules I care about most are simple:
- Do not delete, move, or overwrite cloud-drive files without explicit approval.
- Keep cloud-drive mounts read-only where possible.
- Write generated reports and drafts to a local drafts directory first.
- Ask before sending ad-hoc emails or making calendar changes.
- Treat scheduled workflows as automatic only when their scope was explicitly configured.
- Keep secrets out of git.
- Prefer reversible cleanup over destructive cleanup.
These are not theoretical concerns. Once an agent can touch your files, calendars, messages, and repositories, the difference between “helpful” and “dangerous” is often one missing confirmation step.
GitOps for the agent itself
One pattern that worked well was making the VM pull its desired state from a repository.
The template includes a small pull-based deploy loop: the VM periodically pulls a repo and reconciles selected scripts, config patches, and systemd units. This is intentionally less magical than a full platform deployer. The goal is not to rebuild the machine every time; it is to make the important moving parts reviewable.
That gives a few benefits:
- changes are versioned;
- configuration drift is easier to spot;
- the VM can recover expected scripts after manual experiments;
- the public template can show the pattern without exposing private deployment details.
For an agent host, I prefer pull-based deployment over exposing a public inbound deployment endpoint. The VM can sit behind private networking and still keep itself updated.
Reliability lessons
The first version worked, but a few failure modes appeared quickly.
Long conversations get expensive and slow. If session history grows without bound, model calls become slow, fragile, and eventually fail in boring ways. Proactive compaction is not a nice-to-have; it is part of keeping the system responsive.
Interactive chat should not block on heavy work. A build or long research task can monopolize a chat lane if it runs synchronously. Heavy work should move into background jobs or sub-agents, with short progress updates in the main conversation.
Small VMs need hard limits. If the VM has only a few gigabytes of RAM, local builds, document extraction, or speech tooling can hurt the whole box. Systemd memory limits, swap decisions, and remote build boxes are not glamorous, but they keep the assistant reachable.
That is especially useful for software testing. Instead of running every build, typecheck, or test suite on the small Azure VM, the agent can spawn a disposable development environment through Daytona using Crabbox. The VM remains the coordinator; the dev box becomes the place where heavier work happens.
Provider failures need explicit behavior. Fallback chains should be designed rather than improvised. Some tasks can fall back to another model. Some tasks should stop rather than silently cross a privacy or quality boundary.
What I published as a template
After setting up the private deployment, I wanted to make the reusable parts public without leaking personal infrastructure. That became openclaw-selfhost-template.
The template includes:
- Azure VM Bicep examples;
- pull-based GitOps automation;
- OpenClaw config patch examples;
- systemd guardrails;
- health checks;
- read-only cloud-drive mount patterns;
- GitHub CLI setup;
- optional email/digest patterns;
- a generic workspace policy file.
It deliberately excludes real domains, chat IDs, resource names, OAuth tokens, private repo names, personal M365 routing, personal Claude workflows, backup details, and runtime databases. I created it as a fresh repository with clean history rather than making the private infrastructure repo public.
That distinction matters. Sanitizing the current tree is not enough if old commits contain operational details. A public template should start from a clean history.
Cost model
The rough cost model is straightforward:
- a small Azure VM for always-on hosting;
- an existing GitHub Copilot subscription for the default model lane;
- optional paid APIs only where they add clear value;
- specialist subscriptions or models used sparingly rather than as the default.
This is why routing matters. If every task goes to the most expensive model, a personal agent becomes an impressive toy. If most tasks go through a cheap default lane and only specific workflows use specialist routes, it becomes infrastructure.
Conclusion
The lesson from this project is that a personal agent is not mainly a model-selection problem. It is an infrastructure problem.
The model matters, but the surrounding system matters more: where it runs, what it can read, what it can write, how it schedules work, how it handles secrets, how it recovers, and when it asks for approval.
OpenClaw gave me a practical runtime for that experiment. Azure gave me a simple always-on host. GitHub Copilot-backed access made the default path affordable. The rest was engineering discipline: routing, guardrails, GitOps, and a willingness to treat the agent as software that needs operations, not as a magical chat window.