Portfolio · An AI music teacher

Tom. Teacher of Music. An AI music teacher who has ears.

I had a practice tool that listened to me play — highlighted where the timing drifted, where a note was wrong, where I was getting stronger. When it was removed, so was my progression. I wanted a teacher who could see my score, hear me play, and know me well enough to push where I was ready and back off where I wasn't. No product does this. So I'm building one.

§ Two — Today

Tom today

Tom is running. I talk to him every day.

Beat 01

Tom reads the manual so I don't have to. My Roland Fantom 08 keyboard ships with 190 pages of reference documentation and 354 figures — switches, parameters, patch diagrams, MIDI tables. I ingested the whole manual into Tom's knowledge base: chunked by heading, images captioned by GPT-4o-mini in line with the text they belong to. Now when I need to know how the split point works or which patch is the Rhodes, Tom tells me — citing the page and showing the figure.

The pattern is general: ingest a vendor's technical documentation, emit an expert assistant for the product's user. It applies to any domain where the knowledge lives in dense, illustrated documents people don't read.

Beat 02

Tom lives across surfaces. Telegram for daily check-ins and quick exchanges; Claude Desktop for longer teaching conversations. Both surfaces share the same memory — when I mention a piece in Telegram on Monday, Tom knows about it in Claude Desktop on Sunday. Every incoming message is classified — coach mode for practice check-ins, teacher mode for deeper reasoning — and routed to the right model. Haiku is cheap and responsive; Sonnet is expensive and thoughtful. The right one picks the message up automatically.

Tom system architecture Telegram messages flow into a central classifier-and-router that picks between a Coach model (Haiku) for quick exchanges and a Teacher model (Sonnet) for deeper reasoning. Claude Desktop bypasses the classifier and reads and writes directly to Tom's services through MCP servers; a future practice app will follow the same MCP route. On the right are three memory stores: episodic memory in green, structured state in blue, and reference knowledge in bronze — all reachable from both the classified path and the MCP path. Below, an ingestion pipeline takes source documents through captioning and chunking into the reference store. SURFACES Telegram daily check-ins Claude Desktop long-form teaching Practice App future classified direct via MCP AGENT Classifier routes each message Coach Haiku cheap · fast Teacher Sonnet deep · thoughtful memory · plan · reference shared by every surface MEMORY Episodic mem0 · semantic Structured state JSON in Qdrant Reference FANTOM manual · vector INGESTION Source docs PDF, 190 pages Caption images GPT-4o-mini Chunk by heading + inline figures Embed → Qdrant vector store
Telegram routes through the classifier; Claude Desktop reaches the same memory directly via MCP servers. Two paths, one shared memory.

Beat 03

Memory is not one store. Conversations and observations ("triplets now solid at 230bpm") are episodic — they live in mem0, semantically searchable. The weekly plan and the daily check-in state machine are structured — JSON in Qdrant with deterministic reads and writes, because "did I log my guitar session yet today" isn't a semantic question. The FANTOM reference material is its own vector collection. Three stores, each chosen for what the data actually is.

Tom runs daily. Adapts to my actual pieces — Paradise, For Whom The Bell Tolls, Slow Dancing in a Burning Room. Total cost to run: about £0.70 a month.

§ Three — Tomorrow

Tom tomorrow

What Tom is today is a foundation. What he becomes is a teacher who can see the score, hear the performance, and close the loop between the two.

AppFactory

This is a large build — larger than I would attempt alone. AppFactory is what makes it feasible — the agent-system that turns architecture decisions into shipped software. Tom is the first real demonstration of what one engineer can build with AppFactory behind them.

The stack

Six modalities cohering through one teacher.

  • Text Conversations across Telegram and Claude Desktop, memory that persists. Live today
  • Vision Captioned manual imagery inside Tom's knowledge base. Live today
  • Time-series Session logs, BPM progression, a model of how I actually improve over time. Live today
  • Structured score MusicXML ingested from Rockschool PDFs via OMR, rendered in-browser with OpenSheetMusicDisplay. Partially live
  • MIDI Piano performance captured natively over USB, diffed against the canonical MIDI of the piece. Architected
  • Audio Guitar recorded through the browser, pitch and timing extracted with basic-pitch and DTW. Architected

Asymmetry

One subtle call: piano and guitar need different pipelines. Piano emits MIDI natively — a clean, millisecond-accurate signal. Guitar emits audio through a pickup — messier, polyphonic, needs inference. Same teacher, two capture paths. The architecture mirrors the instruments rather than flattening them into one pipeline.

Score-Match

At the end of that pipeline: a score-aligned playback diff. Record a take, compare it to the score, colour each note as I played it — green correct, amber timing off, red wrong note, grey missed. The capability whose removal ended my RSL journey — rebuilt, and this time it lives inside a teacher who knows me.

Score-aligned playback diff — rebuilt, inside a teacher who knows you.