Blog post

I’ve built the thing that builds the thing

I wanted a personal agent that was actually useful day to day — not a demo chatbot, not a thin wrapper around a single model, and not an expensive always-on research toy.

The goal was more practical: an assistant that could live close to my tools, remember preferences, run scheduled workflows, inspect code, help with documents, and still stay cheap enough to treat as personal infrastructure.

That system now runs on OpenClaw, hosted on a small Azure VM, with GitHub Copilot-backed model access as the default route and a few stricter lanes for specialized work.

Codebase: github.com/martysai/openclaw-selfhost-template — the sanitized Azure + OpenClaw template extracted from this setup.

Why self-host a personal agent?

The interesting part of personal agents is not that they can answer questions. A browser tab can already do that.

The interesting part is whether they can become reliable infrastructure:

  • keep enough continuity to understand ongoing projects;
  • use local files and repositories without uploading everything everywhere;
  • run scheduled jobs without a human sitting in the loop;
  • recover from provider failures and long-running tasks;
  • respect hard boundaries around external actions;
  • be cheap enough to leave running.

This pushes the design away from “one model with many tools” and toward a small operating environment: runtime, scheduler, memory, secrets, source control, policies, and observability.

The architecture

At a high level, the setup looks like this:

Architecture diagram of an OpenClaw personal agent running on an Azure Linux VM with GitHub, model lanes, scheduled workflows, read-only context, GitOps, systemd health checks, and approval gates.
OpenClaw sits on a small Azure Linux VM. Interactive requests come from chat, while GitHub, model lanes, read-only context, GitOps, cron jobs, and systemd health checks stay behind explicit routing and approval boundaries.

I chose Azure mostly because I wanted a boring VM that I could understand. No complicated platform story: just Linux, systemd, SSH, a small number of services, and explicit configuration files.

The public template reflects that bias. It is not a generic “deploy agents anywhere” framework; it is specifically an Azure + OpenClaw self-hosting template.

Speech: ASR, TTS, and SvetlanaClaw

Voice is one of the places where the setup became more personal.

Diagram showing voice notes flowing through Azure Speech ASR into SvetlanaClaw on OpenClaw, and explicit voice replies synthesized back through Azure Speech TTS with the ru-RU-SvetlanaNeural voice.
SvetlanaClaw uses Azure Speech on both sides of the loop: ASR turns voice notes into text, and explicit TTS requests turn short responses back into audio. The Russian Svetlana voice is part of the naming story.

I use Azure Speech for both directions:

  • ASR: voice notes are transcribed through Azure Speech rather than a local Whisper-style model on the VM.
  • TTS: when I explicitly ask for a voice reply, OpenClaw can synthesize a short audio response through Azure Speech.

This was partly a resource decision. A small VM is not the right place to run heavy speech models all the time. Offloading speech to Azure keeps the box responsive and makes the behavior more predictable.

It was also a language decision. I am a native Russian speaker, so Russian speech support matters to me. Azure’s Russian neural TTS catalog includes a voice called ru-RU-SvetlanaNeural; that name was too perfect to ignore. The assistant’s working name became SvetlanaClaw: OpenClaw as the runtime, Svetlana as the Russian-speaking voice/personality layered on top.

The important constraint is the same as everywhere else in the system: voice is explicit, not automatic. Text remains the default. Audio is generated only when I ask for it, and long answers should become short spoken summaries rather than full monologues.

Routing beats one giant model

A useful personal agent should not treat every task as the same kind of model call.

For ordinary conversation, coding help, repository inspection, small edits, and operational work, GitHub Copilot-backed model access is a good default. It is already part of my development workflow, so using it as the main lane keeps the marginal cost predictable.

But not every task belongs there. Some workflows need different privacy boundaries, different context, or more expensive long-form synthesis. The design I prefer is explicit routing:

WorkflowRoute
Normal chat, repo work, operational tasksDefault GitHub Copilot-backed model lane
General fallbackA secondary model lane
Personal cloud documentsRead-only, domain-specific route where possible
Long-form research digestsScarce specialist lane, used intentionally
External writesHuman approval gate

The important part is that fallbacks are not silent. If a task crosses a boundary — for example, from local files into an external model, or from drafting into sending — the system should make that visible.

Safety rails that matter in practice

Most of the work in a personal agent is not glamorous. It is deciding what the agent must not do by default.

The rules I care about most are simple:

  • Do not delete, move, or overwrite cloud-drive files without explicit approval.
  • Keep cloud-drive mounts read-only where possible.
  • Write generated reports and drafts to a local drafts directory first.
  • Ask before sending ad-hoc emails or making calendar changes.
  • Treat scheduled workflows as automatic only when their scope was explicitly configured.
  • Keep secrets out of git.
  • Prefer reversible cleanup over destructive cleanup.

These are not theoretical concerns. Once an agent can touch your files, calendars, messages, and repositories, the difference between “helpful” and “dangerous” is often one missing confirmation step.

GitOps for the agent itself

One pattern that worked well was making the VM pull its desired state from a repository.

The template includes a small pull-based deploy loop: the VM periodically pulls a repo and reconciles selected scripts, config patches, and systemd units. This is intentionally less magical than a full platform deployer. The goal is not to rebuild the machine every time; it is to make the important moving parts reviewable.

That gives a few benefits:

  • changes are versioned;
  • configuration drift is easier to spot;
  • the VM can recover expected scripts after manual experiments;
  • the public template can show the pattern without exposing private deployment details.

For an agent host, I prefer pull-based deployment over exposing a public inbound deployment endpoint. The VM can sit behind private networking and still keep itself updated.

Reliability lessons

The first version worked, but a few failure modes appeared quickly.

Long conversations get expensive and slow. If session history grows without bound, model calls become slow, fragile, and eventually fail in boring ways. Proactive compaction is not a nice-to-have; it is part of keeping the system responsive.

Interactive chat should not block on heavy work. A build or long research task can monopolize a chat lane if it runs synchronously. Heavy work should move into background jobs or sub-agents, with short progress updates in the main conversation.

Small VMs need hard limits. If the VM has only a few gigabytes of RAM, local builds, document extraction, or speech tooling can hurt the whole box. Systemd memory limits, swap decisions, and remote build boxes are not glamorous, but they keep the assistant reachable.

Diagram showing an OpenClaw agent on an Azure VM using Crabbox to ask Daytona for an ephemeral development box, run tests and builds there, and return logs and artifacts back to the agent.
For software work, Crabbox keeps heavy verification away from the always-on agent VM. OpenClaw asks Crabbox to run the gate, Crabbox creates a Daytona-backed dev box, the tests run there, and the agent gets back logs, exit codes, and artifacts.

That is especially useful for software testing. Instead of running every build, typecheck, or test suite on the small Azure VM, the agent can spawn a disposable development environment through Daytona using Crabbox. The VM remains the coordinator; the dev box becomes the place where heavier work happens.

Provider failures need explicit behavior. Fallback chains should be designed rather than improvised. Some tasks can fall back to another model. Some tasks should stop rather than silently cross a privacy or quality boundary.

What I published as a template

After setting up the private deployment, I wanted to make the reusable parts public without leaking personal infrastructure. That became openclaw-selfhost-template.

The template includes:

  • Azure VM Bicep examples;
  • pull-based GitOps automation;
  • OpenClaw config patch examples;
  • systemd guardrails;
  • health checks;
  • read-only cloud-drive mount patterns;
  • GitHub CLI setup;
  • optional email/digest patterns;
  • a generic workspace policy file.

It deliberately excludes real domains, chat IDs, resource names, OAuth tokens, private repo names, personal M365 routing, personal Claude workflows, backup details, and runtime databases. I created it as a fresh repository with clean history rather than making the private infrastructure repo public.

That distinction matters. Sanitizing the current tree is not enough if old commits contain operational details. A public template should start from a clean history.

Cost model

The rough cost model is straightforward:

  • a small Azure VM for always-on hosting;
  • an existing GitHub Copilot subscription for the default model lane;
  • optional paid APIs only where they add clear value;
  • specialist subscriptions or models used sparingly rather than as the default.

This is why routing matters. If every task goes to the most expensive model, a personal agent becomes an impressive toy. If most tasks go through a cheap default lane and only specific workflows use specialist routes, it becomes infrastructure.

Conclusion

The lesson from this project is that a personal agent is not mainly a model-selection problem. It is an infrastructure problem.

The model matters, but the surrounding system matters more: where it runs, what it can read, what it can write, how it schedules work, how it handles secrets, how it recovers, and when it asks for approval.

OpenClaw gave me a practical runtime for that experiment. Azure gave me a simple always-on host. GitHub Copilot-backed access made the default path affordable. The rest was engineering discipline: routing, guardrails, GitOps, and a willingness to treat the agent as software that needs operations, not as a magical chat window.