DaemonCraft: a local-first embodied companion in Minecraft, powered by Gemma 4
DaemonCraft combines Hermes, gAndy, Ollama and Minecraft to turn screen time into guided, creative, visible play.
Children already spend hours in digital worlds. The educational question is what kind of experience fills that time. Much of it still leans toward passive consumption: endless stimulation, little continuity, and few tools that help a child build something of their own. Parents feel that tension clearly. They want to protect attention and safety while preserving one of the few spaces where curiosity, play and long-form creation can still happen.
DaemonCraft begins from that problem. It places a persistent companion inside Minecraft: a daemon with memory, visible action and continuity across play. The system can move through the world, remember places, help with tasks, support shared stories and remain present as a partner rather than a detached assistant. Minecraft is the right medium because it already gives children a language of construction, exploration, cooperation and narrative. Actions inside that world are concrete. They can be seen, checked and discussed.
The project turns screen time into accompanied creation. A child can ask for help building a base, exploring a cave, finding a route, continuing a story or remembering a place that matters in the world. That interaction keeps imagination, attention and learning inside the same shared scene. The companion supports play from within the world itself, where its help has immediate consequences and visible results.
DaemonCraft works through a hybrid architecture with clear roles. Hermes carries the narrative, pedagogical and control layer. It interprets intent, keeps long-range context, asks for clarification when a request is vague, and protects the interaction when safety or scope require intervention. gAndy carries the embodiment logic of the stack. It takes a clear body-oriented intent together with a structured world state and returns body actions that can be executed inside Minecraft.
The body loop is explicit. The embodied-service composes world_state, calls gAndy through Ollama, validates the JSON contract, and dispatches the resulting actions to Mineflayer, which operates the bot inside the server. world_state carries the information the body needs: position, inventory, nearby blocks, remembered places, entities and other local state. It also carries mBit, a compact text-native spatial representation of nearby space. mBit gives the stack a readable local map without flooding the model with a large raw dump of the environment. In practice, it already participates in the working runtime both when Hermes acts directly through Mineflayer and when gAndy operates inside the constrained primitive loop.
This architecture supports progressive delegation. Hermes keeps direct access to the body and handles the tasks that still belong in the higher control layer. gAndy absorbs the body primitives that have already been measured, debugged and validated as reliable. Each primitive migrates into gAndy's scope only after passing that gate. The result is a system that stays operational, observable and stable while the model's role grows through real engineering work rather than wishful thinking.
Gemma 4 matters here for specific reasons. It runs locally through Ollama in the current prototype, which keeps the embodied loop private, inspectable and inexpensive to operate. That local-first design also keeps the project aligned with the realities of schools, communities and families that need control over infrastructure and data. In deployment, the same approach supports shared infrastructure under community, school or regional control while preserving a privacy-respecting architecture and a low marginal cost per interaction.
Gemma 4 also fits the shape of the task. The model is not being asked to improvise as a general conversational agent for every layer of the system. It is being used where a compact, structured body reasoner is valuable: translating intent and context into body_plan, checks and tool_calls. That division reduces token cost as well. Repeated body orchestration does not need to consume frontier-model tokens step after step if a smaller local model can carry a bounded part of that work with the right contract and the right state representation.
We tested the system on a real Minecraft stack: PaperMC, a live Mineflayer bot, a local embodied-service and gAndy served through Ollama. Our initial field validation measured a focused set of body primitives including follow, goto, move_away, mine_block, collect_drops, get_inventory, remember_here and goto_remembered_place. The broader stack adds canonical tests and benchmarks as well: deterministic policy by category, versioned experiment setup, world_state and mBit validation, and unit tests for the embodiment bridge. Early pilot work around mBit is also shaping the next training loop through local snapshots and before/after verification episodes, so the same compact spatial signal that already serves runtime execution can become training material for local tactics and local self-verification.
The policy layer that makes that possible stays intentionally simple and general. Hermes normalizes intents, filters scope, detects ambiguity, narrows allowed_tools by category and decomposes multi-step tasks into atomic steps when needed. A vague request triggers clarification. A request outside the body layer is handled upstream. A clear request that belongs to a validated primitive reaches gAndy inside a constrained action palette. This keeps the system legible for operators and dependable for users.
That same structure supports safety and trust. A companion for children needs visible boundaries as much as it needs capability. DaemonCraft keeps dashboards, logs, allowed-tool policies and a clean separation between narrative, body and execution. The system can show what happened, why it happened and which layer made the decision. That observability is part of the product, not an afterthought.
The educational and social stakes are straightforward. DaemonCraft supports curiosity, creative persistence and collaborative problem solving inside a world children already inhabit willingly. It opens a path toward safer, more meaningful screen time in contexts where tutoring, extracurricular support and expensive proprietary tools remain out of reach. A locally operated companion can be shared across family, school or community infrastructure while keeping the experience more accountable and more adaptable to the people who use it.
The horizon is equally clear. More body primitives can move into gAndy as the training and evaluation loop matures. More bots can inhabit shared worlds. The same daemon architecture can later travel beyond Minecraft into other tools, environments and educational interfaces. Minecraft is the first body because it makes action, feedback and co-creation visible. From there, the larger idea becomes tangible: an AI companion that helps children build, remember, learn and create from inside the worlds where their attention already lives.