ra-yavuz › lillycoder

lillycoder
A small CLI that drops you into a chat REPL inside any folder, with a
persona that evolves. The model on the other end can read, write, and
edit your files, run shell commands, and install packages, all gated by
a per-tool permission prompt. Talks to any local OpenAI-compatible
/v1 endpoint. No cloud, no API key, no telemetry, no account.
What sets it apart: a real persona system. Six bundled voices (default kid coder, tsundere, yandere, sweet, calm-adult, analytical), live switching, copy-on-write user shadows, and a model-driven evolve mode where Lilly rewrites her own system prompt over time and the shape persists across sessions.
What it is
A 10-file Python package plus a CLI shim. You run lillycoder
in a project directory and start typing. The model picks tools from a
fixed set (read_file, write_file,
edit_file, bash, mkdir,
mv, rm, grep, find,
list_dir, pkg_install) to do what you asked.
Every mutating action prompts:
🦊 lilly wants to: write_file("src/index.js", 142 chars)
[y]es [n]o [a]lways for this tool [p]ath: always for this exact target
>
What it is not
lillycoder does not start LLM servers, manage Docker, or ship a model.
It expects a server already running on localhost (llama.cpp, ollama, LM
Studio, etc.). On first run it scans common ports and offers to use
whatever it finds, or you pass --api.
Install
One line (Debian / Ubuntu)
Sets up the signed ra-yavuz apt repo if not already added,
refreshes the package index, and installs lillycoder. Idempotent, safe to
re-run:
sudo bash -c 'set -e; install -m 0755 -d /etc/apt/keyrings && curl -fsSL https://ra-yavuz.github.io/apt/pubkey.gpg -o /etc/apt/keyrings/ra-yavuz.gpg && echo "deb [signed-by=/etc/apt/keyrings/ra-yavuz.gpg] https://ra-yavuz.github.io/apt stable main" > /etc/apt/sources.list.d/ra-yavuz.list && apt update && apt install -y lillycoder'
If you already added the ra-yavuz apt repo earlier, all you need is
sudo apt update && sudo apt install lillycoder. The
sudo apt update step is required: without it apt will not see
new packages or new versions.
One line via the bundled installer script
Equivalent to the above, with extra prerequisite checks and a friendlier output summary:
curl -fsSL https://raw.githubusercontent.com/ra-yavuz/lillycoder/main/scripts/get.sh | sudo bash
If you would rather read the script first (recommended for any
curl | bash):
curl -fsSL https://raw.githubusercontent.com/ra-yavuz/lillycoder/main/scripts/get.sh -o get.sh
less get.sh
sudo bash get.sh
Step by step (manual repo setup)
# 1. Trust the signing key
sudo install -d -m 0755 /etc/apt/keyrings
curl -fsSL https://ra-yavuz.github.io/apt/pubkey.gpg \
| sudo tee /etc/apt/keyrings/ra-yavuz.gpg >/dev/null
# 2. Add the apt source
echo "deb [signed-by=/etc/apt/keyrings/ra-yavuz.gpg] https://ra-yavuz.github.io/apt stable main" \
| sudo tee /etc/apt/sources.list.d/ra-yavuz.list
# 3. Refresh the package index, then install
sudo apt update
sudo apt install lillycoder
From source (any Linux, also macOS via pip)
git clone https://github.com/ra-yavuz/lillycoder.git
cd lillycoder
pip install --user -e .
Platform support
Tested on Ubuntu (Linux only). Should also work on
WSL2 Ubuntu / Debian (it is a Linux distro, the apt
path applies). On macOS, the .deb and
apt install paths do not apply, but the from-source
pip install --user -e . path is expected to work because
the dependencies (httpx, prompt_toolkit,
rich, pydantic) are all cross-platform and
lillycoder shells out to standard POSIX tools that exist on Darwin.
macOS support is not regularly tested by the author, so if you hit a
portability issue please open an issue.
Quick start
Have an LLM server running somewhere on localhost. Then in any project:
cd ~/myproject
lillycoder
🦊 scanning localhost for LLM servers...
🦊 found 1 endpoint: http://localhost:11434/v1 (ollama, 3 models)
use it? [Y/n] y
✓ ollama · qwen2.5-coder:7b
🦊 lilly is awake · qwen2.5-coder:7b · /home/you/myproject · 11 tools
type a message · /help for commands · /exit to leave
[ctx 1.2k/8k·15%] › what files are in this folder?
Personalities
lillycoder ships six bundled personas, all written in first person with explicit anti-roleplay rules so a local model still sounds like Lilly typing rather than narrating about her:
| name | voice |
|---|---|
default | nine-and-a-half-year-old kid coder, warm and curious |
tsundere | snippy, grumpy, still does the work |
yandere | doting, focused on the user, mildly possessive about the code |
sweet | gentle, encouraging, low-key cheerful |
adult | calm senior engineer voice, no exclamation marks |
analytical | precise, methodical, distinguishes "checked" from "assumed" |
Switch live with /personalities load <name>. Add your
own with /personalities add <name> <text> or
drop a markdown file into ~/.config/lillycoder/personas/;
user files shadow bundled ones of the same name.
When you shadow a bundled persona, lillycoder snapshots the bundled
text at the moment of override (a .bundled-base.md
sidecar). Later you can run /personalities diff <name>
to see your edits AND any upstream drift since you forked. Bundled
files are never overwritten by an update if you have a shadow.
Lilly can manage her own personalities through real tool calls
(add_persona, clone_persona,
set_active_persona, set_evolve). Tell her
"make a pirate persona and switch to it" and she does it through the
tool registry, not by writing files in your repo.
Flip /persona-evolve on to snapshot the current in-memory
persona to disk and switch to it. From then on, every persona rewrite
(whether by the model itself via set_persona, or by you
inline) gets persisted to that file. Next launch, lillycoder reloads
the last active persona automatically.
Token budget
The default /max-tokens auto computes a per-reply cap
from your model's reported context window (about 85% of remaining
headroom, with a 4096-token ceiling). That matters because:
- Most local servers default to a tiny
n_predict(llama.cpp's default is 128). lillycoder'sautoreplaces that with a real number. - Reasoning models burn unpredictable amounts of budget on hidden
<think>content before they emit visible text. With a small fixed cap they can exhaust the budget inside the think block.autoleaves headroom for both.
Set an explicit cap any time:
/max-tokens 256 for snappy answers,
/max-tokens 4096 for long-form. Or via CLI:
lillycoder --max-tokens 4096.
Compatible servers
| Server | Default port | Notes |
|---|---|---|
| hydra-llm | 18080+ | recommended pairing (sibling project) |
llama.cpp llama-server | 8080 | OpenAI /v1 shape native |
| ollama | 11434 | OpenAI surface at /v1 |
| LM Studio | 1234 | built-in local server |
| any other | any | --api http://your.url/v1 |
The model on the other end matters. Tool-calling reliability needs a
model trained for it. lillycoder warns when the chosen model is not in
its known-tool-capable allowlist (Qwen 2.5+, Qwen 3, Gemma 3+, Llama
3.1+, Mistral Small 3, Dolphin 3 R1). Pass --force to
silence.
Pairs with hydra-llm
hydra-llm is a sibling project that manages local LLM servers: it wraps llama.cpp in Docker, ships a curated GGUF catalog with anonymous downloads, and exposes each running model as an OpenAI-compatible endpoint on a stable local port. lillycoder talks that exact shape, so the two compose into a fully local coding agent in one terminal:
# in hydra-llm:
hydra-llm start qwen2.5-32b # or any 'code' tagged model
hydra-llm api qwen2.5-32b # prints the URL
# in your project directory:
lillycoder --api http://localhost:18087/v1
# (lilly auto-detects common local LLM ports too, so just `lillycoder` often works)
hydra-llm handles model lifecycle (download, start/stop, system prompts, persistent sessions, optional KDE Plasma widget). lillycoder is the agent on top: file tools, shell tools, grep, permission gating. Use them together, or use lillycoder with whatever local server you already run.
Safety
Hard-banned commands cannot be turned off by --bypass-permissions:
sudo, rm -rf /, rm -rf ~,
mkfs, dd of=/dev/*, recursive
chmod / chown of / or
~, fork bombs. They are refused at the safety classifier
before exec. Writes outside the working directory are also blocked
by default; widen with LILLY_ALLOW_OUTSIDE_CWD=1 if you
really mean it.