Graph Memory Blog

Deploying Graph Memory with Docker

2026-03-28T00:00:00.000Z

Graph Memory ships as a multi-platform Docker image (amd64 + arm64) on GitHub Container Registry. This post walks through a complete production deployment: Docker Compose setup, volume configuration, authentication, Redis caching, and health monitoring.

Quick start

The fastest way to get running:

docker run -d \
  --name graph-memory \
  -p 3000:3000 \
  -v $(pwd)/graph-memory.yaml:/data/config/graph-memory.yaml:ro \
  -v /path/to/my-app:/data/projects/my-app:ro \
  -v graph-memory-models:/data/models \
  ghcr.io/graph-memory/graphmemory-server

Three volume mounts. The config file, your project directory, and a named volume for the embedding model cache. That's all you need.

Docker Compose for production

Here's a complete docker-compose.yml with Redis for embedding cache:

services:
  graphmemory:
    image: ghcr.io/graph-memory/graphmemory-server:latest
    restart: unless-stopped
    ports:
      - "127.0.0.1:3000:3000"
    volumes:
      - ./graph-memory.yaml:/data/config/graph-memory.yaml:ro
      - /srv/projects/my-app:/data/projects/my-app
      - models:/data/models
    environment:
      - NODE_ENV=production
      - LOG_JSON=1
      - LOG_LEVEL=info
    depends_on:
      redis:
        condition: service_healthy

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    volumes:
      - redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3

volumes:
  models:
  redis-data:

A few things to note:

Bind to localhost (127.0.0.1:3000:3000). Don't expose Graph Memory directly to the internet. Put a reverse proxy in front.
LOG_JSON=1 enables structured JSON logging, useful for log aggregation services.
LOG_LEVEL controls verbosity: debug, info, warn, or error.
Redis health check ensures Graph Memory doesn't start until Redis is ready.

The config file

Your graph-memory.yaml needs container-relative paths:

server:
  host: "0.0.0.0"              # bind to all interfaces inside container
  port: 3000
  modelsDir: "/data/models"
  jwtSecret: "your-secret-here-at-least-32-characters-long"
  redis:
    enabled: true
    url: "redis://redis:6379"  # service name from docker-compose

users:
  admin:
    passwordHash: "$scrypt$..."    # generate with: graphmemory users add
    apiKey: "gm_..."               # for programmatic MCP access

projects:
  my-app:
    projectDir: "/data/projects/my-app"

Important: always set host: "0.0.0.0" inside the container. The default 127.0.0.1 would only accept connections from within the container itself.

Volume mounts explained

Container path	Purpose	Mount type
`/data/config/graph-memory.yaml`	Configuration file	Bind mount, read-only
`/data/projects/`	Project source code	Bind mount
`/data/models`	Embedding model cache (~560 MB)	Named volume

Model cache

The default embedding model (Xenova/bge-m3) downloads on first startup. Use a named volume for /data/models so you don't re-download 560 MB every time the container restarts.

Project directory access

Mount project directories as read-only (:ro) if you only need docs, code, and file indexing. If you use knowledge, tasks, or skills, remove :ro -- the file mirror needs write access to create .notes/, .tasks/, and .skills/ directories inside the project.

Production checklist

1. Set a JWT secret

The jwtSecret must be at least 32 characters. It signs authentication tokens for the Web UI and API access. Without it, anyone with network access can read and modify your graphs.

server:
  jwtSecret: "generate-a-random-string-at-least-32-chars"

2. Configure users

Add users with password hashes (for Web UI login) and/or API keys (for programmatic MCP access):

# Generate a user interactively
docker compose run --rm graphmemory users add --config /data/config/graph-memory.yaml

Or set API keys directly in the config:

users:
  ci-bot:
    apiKey: "gm_your-api-key-here"
    defaultAccess: read

3. Enable Redis

Redis serves as an embedding cache. Without it, embeddings are computed fresh every time a node is created or updated. With Redis, repeated embeddings of the same content are cached, which speeds up re-indexing significantly.

server:
  redis:
    enabled: true
    url: "redis://redis:6379"

4. Set up a reverse proxy

Graph Memory listens on HTTP. For production, put nginx, Caddy, or Traefik in front for TLS termination:

server {
    listen 443 ssl;
    server_name memory.example.com;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
    }
}

The Upgrade and Connection headers are required for WebSocket support (real-time UI updates).

Health check

The Docker image includes a built-in health check that hits the /api/auth/status endpoint every 30 seconds:

HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
  CMD node -e "fetch('http://localhost:3000/api/auth/status').then(r=>{if(!r.ok)throw r.status}).catch(()=>process.exit(1))"

The 30-second start period gives the embedding model time to download on first boot. Monitor health with:

docker inspect --format='{{.State.Health.Status}}' graph-memory

Graceful shutdown

Graph Memory handles SIGTERM and SIGINT signals. On shutdown, it:

Stops accepting new connections
Drains all pending mutation queues
Closes file watchers and mirror watchers
Saves all dirty graphs to disk
Force exits after 5 seconds if graceful shutdown hangs

This means docker compose down and docker stop both result in clean shutdowns with no data loss.

Multiple projects

Mount each project directory and list them in the config:

projects:
  frontend:
    projectDir: "/data/projects/frontend"
  backend:
    projectDir: "/data/projects/backend"
  docs:
    projectDir: "/data/projects/docs"

# docker-compose.yml
volumes:
  - /srv/code/frontend:/data/projects/frontend
  - /srv/code/backend:/data/projects/backend
  - /srv/code/docs:/data/projects/docs:ro

Each project gets its own MCP endpoint: /mcp/frontend, /mcp/backend, /mcp/docs.

Other Docker commands

# Force re-index all projects
docker compose run --rm graphmemory \
  serve --config /data/config/graph-memory.yaml --reindex

# One-shot index (index and exit, no server)
docker compose run --rm graphmemory \
  index --config /data/config/graph-memory.yaml

# Add a user interactively
docker compose run --rm graphmemory \
  users add --config /data/config/graph-memory.yaml

The Dockerfile

Graph Memory uses a multi-stage build. The first stage installs all dependencies and builds the TypeScript server and React UI. The runtime stage copies only the compiled output and production dependencies. The image runs as a non-root app user.

Stage 1 (deps):    npm ci for server + UI
Stage 2 (build):   tsc → dist/, vite → ui/dist/
Stage 3 (runtime): node:24-slim + production deps + compiled output

The base image is node:24-slim -- minimal Debian with Node.js, no extra packages. Multi-arch builds are handled by GitHub Actions with QEMU + Buildx, producing images for both linux/amd64 and linux/arm64.

That's a complete production setup. Config file, Docker Compose, Redis, authentication, reverse proxy, and health monitoring. The server handles the rest -- indexing, embedding, real-time sync, and graceful shutdown.

Full Docker documentation | Configuration reference

File Mirror — Edit AI Memory in Your IDE

2026-03-27T00:00:00.000Z

Graph Memory stores notes, tasks, and skills as nodes in a graph. But graphs aren't git-friendly. You can't diff a graph, review it in a PR, or edit it in VS Code. File mirror solves this by maintaining a bidirectional sync between the graph and plain markdown files on disk.

How it works

When you create a note, task, or skill -- via MCP tool, REST API, or the Web UI -- Graph Memory writes it to disk as a directory with three files:

.notes/
  auth-architecture/
    events.jsonl        # append-only event log (source of truth)
    content.md          # human-editable content (plain markdown)
    note.md             # generated snapshot (gitignored)
    attachments/        # optional file attachments

.tasks/
  fix-login-bug/
    events.jsonl
    description.md
    task.md             # generated snapshot (gitignored)

.skills/
  deploy-to-staging/
    events.jsonl
    description.md
    skill.md            # generated snapshot (gitignored)

Each entity gets its own directory named by its slug ID. Inside, the events.jsonl file is the source of truth -- an append-only log of every create, update, and relation change. The content.md (or description.md) file holds the human-readable body text. The snapshot file (note.md, task.md, skill.md) is a generated read-only view with YAML frontmatter -- it's gitignored because it gets regenerated from events + content.

What a mirrored file looks like

Here's what a task looks like on disk. The description.md is plain markdown:

Implement rate limiting on the /api/auth endpoints.
Use a sliding window counter with Redis backing.
Allow 10 requests per minute per IP.

The generated task.md snapshot combines frontmatter with the full content:

---
id: rate-limit-auth
status: in_progress
priority: high
order: 0
tags:
  - security
  - api
assignee: "alice"
dueDate: "2026-04-15T00:00:00.000Z"
estimate: 4
completedAt: null
createdAt: "2026-03-28T10:00:00.000Z"
updatedAt: "2026-03-30T14:30:00.000Z"
version: 3
relations:
  - to: auth-service-hardening
    kind: blocks
  - to: "@code::src/middleware/rate-limit.ts::RateLimiter"
    kind: relates_to
    graph: code
---
# Rate Limit Auth Endpoints

Implement rate limiting on the /api/auth endpoints.
Use a sliding window counter with Redis backing.
Allow 10 requests per minute per IP.

The frontmatter contains all structural metadata: status, priority, tags, timestamps, relations. Cross-graph links show up as relations with a graph field.

Bidirectional sync

The sync works in both directions:

Graph to file: When a mutation happens in the graph (via MCP tool, REST, or UI), the graph manager calls mirrorNoteCreate, mirrorTaskUpdate, etc. These functions use atomic writes (write to temp file, then rename) to prevent corruption from concurrent reads. After writing, the MirrorWriteTracker records the file's mtime so the watcher knows to ignore its own writes.

File to graph: A chokidar watcher monitors .notes/, .tasks/, and .skills/ at depth 3 (to catch attachments). When a file changes, the watcher:

Checks MirrorWriteTracker -- if this was our own write, skip it (prevents feedback loops)
Classifies the file (events.jsonl, content.md, snapshot, or attachment)
Enqueues the import through the PromiseQueue (same queue as MCP mutations)
Parses the directory and calls importFromFile on the graph manager

The MirrorWriteTracker uses mtime comparison with a tolerance window to reliably detect our own writes vs external edits. It evicts stale entries to prevent unbounded memory growth.

Editing in your IDE

The most immediate benefit: open .tasks/fix-login-bug/description.md in your editor, change the description, save. The watcher picks it up, re-parses the directory, and updates the graph. The Web UI updates in real time via WebSocket.

You can also edit the snapshot files directly. If you change the status field in task.md from todo to in_progress, the watcher detects the delta against the current graph state, appends an update event to events.jsonl, writes the new description to description.md, and re-imports everything. This works for any frontmatter field: status, priority, tags, due dates.

Git workflow

The file structure is designed for git. The .gitignore inside .notes/, .tasks/, and .skills/ excludes the generated snapshot files (*/note.md, */task.md, */skill.md), so only the source-of-truth files get committed:

events.jsonl -- full audit trail of every change
content.md / description.md -- human-readable content
attachments/ -- associated files

This means you can:

Review AI-generated tasks in a PR. Your AI assistant creates tasks via MCP tools, the files appear in the diff, and teammates review them like any other code change.
Track decisions over time. The events.jsonl gives you a complete history of every field change with timestamps.
Merge across branches. Since events.jsonl is append-only, git merges usually succeed without conflicts. On the next server startup, scanMirrorDirs detects any files newer than the graph and re-imports them.
Collaborate across machines. Pull, start the server, and the mirror scan picks up everything your teammates added.

Startup scan

When the server starts, scanMirrorDirs walks all three directories and compares each entity's file mtime against the graph's updatedAt timestamp. If the file is newer (e.g., after a git pull brought in new events), it re-imports the entity. This handles the case where files changed while the server was down.

Startup:
  for each .notes/{id}/ directory:
    if events.jsonl mtime > graph node updatedAt:
      parseNoteDir(entityDir) → importFromFile()
  (same for .tasks/ and .skills/)

Conflict resolution

The system avoids conflicts by design:

Structural changes (status, priority, tags) go through the event log. The graph manager replays all events on import, so the last event wins.
Content changes are file-level. If you edit content.md while the server is running, the watcher picks it up immediately. If you edit it while the server is down, the startup scan catches it.
Concurrent edits from MCP and file system are serialized through the same PromiseQueue. There's no race condition because both paths go through enqueue().

The one edge case: if you edit content.md in your IDE at the exact same moment an MCP tool updates it, the queue serializes them. Whichever enqueues second overwrites the first. In practice, this doesn't happen -- humans and AI rarely edit the same note body simultaneously.

File mirror makes AI memory tangible. It's not locked in a database or hidden behind an API. It's markdown files in your project, editable in your IDE, reviewable in PRs, trackable in git history.

Get started with Graph Memory or read the full docs on file mirror.

From 0 to 70 MCP Tools — The Architecture of Graph Memory

2026-03-26T00:00:00.000Z

Graph Memory exposes 70 MCP tools, a REST API, and a WebSocket event stream from a single Node.js process. This post breaks down the architecture that makes it work: Graphology for storage, tree-sitter for AST parsing, serial queues for mutation safety, and hybrid search for retrieval.

The big picture

Graphology: the storage layer

Every graph is a Graphology DirectedGraph instance. Six of them run per project:

Graph	Node type	Edge semantics
DocGraph	Markdown heading chunks	parent-child (heading hierarchy), cross-file links
CodeGraph	Functions, classes, imports	calls, imports, exports, contains
KnowledgeGraph	User-created notes	typed relations, cross-graph proxy links
TaskGraph	Tasks and epics	blocks, depends_on, parent/child
SkillGraph	Reusable procedures	relates_to, cross-graph links
FileIndexGraph	Every project file	directory containment, language tagging

Graphology gives us constant-time node/edge lookup, iteration, and serialization to JSON. Each node carries an embedding array (from the embedding model) alongside its domain attributes. The entire graph lives in memory and serializes to disk as compressed JSON on shutdown and at periodic auto-save intervals.

Cross-graph connections use proxy nodes. When a note links to a code symbol, the KnowledgeGraph creates a proxy node like @code::src/auth.ts::AuthService and connects it with a typed edge. The proxy stores a proxyFor attribute pointing to the real node in the CodeGraph. Orphaned proxies are cleaned up automatically when the target node disappears.

tree-sitter WASM: code understanding

Graph Memory uses web-tree-sitter (the WASM build of tree-sitter) to parse TypeScript, JavaScript, TSX, and JSX into ASTs. From the AST, it extracts:

Function and method declarations (name, parameters, return type, body span)
Class declarations with their members
Import/export relationships
Call expressions connecting symbols to each other

The WASM approach was a deliberate choice over native tree-sitter bindings. Native bindings require platform-specific compilation and break in Docker multi-arch builds. WASM runs identically on amd64 and arm64 with no native dependencies.

PromiseQueue: mutation serialization

Every write operation in Graph Memory passes through a PromiseQueue -- a simple serial async queue that executes functions one at a time, in order:

export class PromiseQueue {
  private queue: Array<() => Promise<void>> = [];
  private running = false;

  enqueue<T>(fn: () => Promise<T>): Promise<T> {
    return new Promise<T>((resolve, reject) => {
      this.queue.push(async () => {
        try { resolve(await fn()); } catch (e) { reject(e as Error); }
      });
      if (!this.running) this.drain();
    });
  }
}

This solves a real problem. Multiple MCP clients can connect simultaneously, and the REST API accepts concurrent requests. Without serialization, two clients creating notes at the same time could corrupt the graph. The queue ensures mutations execute sequentially while reads can happen freely (Graphology reads are safe concurrent with the event loop since mutations yield at await points).

The MCP server uses a proxy pattern to wrap mutation tool handlers. createMutationServer intercepts registerTool calls and wraps each handler in queue.enqueue(). Read-only tools bypass the queue entirely.

Hybrid search: BM25 + vector + RRF + BFS

Search in Graph Memory fuses three strategies:

Vector cosine similarity -- every node's content is embedded via BGE-M3 (ONNX runtime). Query embeddings use a separate embedQuery function with instruction prefixes optimized for retrieval.
BM25 keyword search -- a custom BM25 index tokenizes content with camelCase splitting, stop-word removal, and term frequency normalization. This catches exact matches that vector search misses ("getUserById" as a query matches the function name precisely).
Reciprocal Rank Fusion (RRF) -- the vector and BM25 result lists are fused using RRF scoring (1 / (k + rank)), which combines both rankings without needing score normalization.

After fusion, the top-K seeds are expanded via BFS graph traversal. If a note scores highly, its linked notes get a decayed score boost. This means searching for "authentication" surfaces not just the auth note itself, but related notes about JWT tokens, session management, and security decisions.

The search mode is configurable per query: hybrid (default), vector, or keyword.

How a tool call flows

Here's the path of a notes_create MCP tool call:

1. MCP client sends JSON-RPC request
2. StreamableHTTPServerTransport routes to session's McpServer
3. McpServer dispatches to registered tool handler
4. createMutationServer wraps handler → queue.enqueue()
5. PromiseQueue executes when it's this request's turn:
   a. KnowledgeGraphManager.createNote()
   b. Generate slug ID, validate input
   c. embedFn(title + content) → embedding vector
   d. graph.addNode(id, { title, content, embedding, ... })
   e. BM25 index updated
   f. ctx.markDirty() → flags project for auto-save
   g. mirrorNoteCreate() → writes .notes/{id}/events.jsonl + content.md
   h. ctx.emit('note:created', { id, title, ... })
6. EventEmitter fires → WebSocket server broadcasts to connected UI clients
7. Tool returns { id, title } to MCP client

Every mutation follows this pattern. The graph manager encapsulates the full lifecycle: validate, embed, mutate graph, update search index, mark dirty, mirror to disk, emit event.

Key design decisions

CommonJS, not ESM. The project uses module: "CommonJS" in tsconfig. Several dependencies (Graphology, ONNX Runtime) have better CommonJS support, and the WASM loading for tree-sitter is simpler in CJS context.

Web-tree-sitter over native. Native tree-sitter bindings are faster but require platform-specific compilation. The Docker image supports both amd64 and arm64 -- WASM handles this transparently.

File mirror with bidirectional sync. Every note, task, and skill is mirrored as markdown files with YAML frontmatter. A chokidar watcher detects external edits and imports them back into the graph. This makes AI memory editable in any IDE and committable to git.

Three serial indexing queues. Docs, code, and file index run as independent sequential queues. They process concurrently with each other but each queue is serial internally. This prevents file-level race conditions while keeping indexing fast.

EventEmitter for real-time sync. The ProjectManager extends EventEmitter. Every graph mutation emits an event (note:created, task:updated, etc.) that the WebSocket server broadcasts to connected clients. The Web UI updates in real time without polling.

Numbers

At the time of writing, Graph Memory registers 70 MCP tools across the six graphs:

Docs: 10 tools (search, list, get, explain, cross-references)
Code: 5 tools (list files, get symbols, search)
Knowledge: 12 tools (CRUD notes + relations + attachments)
Tasks: 17 tools (CRUD + bulk ops + epics)
Skills: 14 tools (CRUD + recall + usage tracking)
Files: 3 tools (list, search, get info)
Context: 1 tool (project/workspace info)
Epics: 8 tools (CRUD + link/unlink tasks)

Each tool is a thin adapter -- typically under 50 lines -- that validates input with Zod, calls the graph manager, and formats the response. The real logic lives in the managers.

The architecture is intentionally straightforward. Graphology handles graph storage, PromiseQueue handles concurrency, EventEmitter handles real-time sync, and the graph managers tie it all together. No database server, no message broker, no external dependencies beyond the embedding model.

Explore the source on GitHub or get started in under a minute.

Why We Chose Local Embeddings Over API Calls

2026-03-25T00:00:00.000Z

Graph Memory generates vector embeddings for every node in every graph — doc chunks, code symbols, files, notes, tasks, skills. We run these embeddings locally using ONNX Runtime, not through an API like OpenAI's. This was a deliberate choice with real trade-offs.

How it works

Graph Memory uses the @huggingface/transformers library to run ONNX models directly in Node.js. The default model is Xenova/bge-m3, a multilingual embedding model quantized to 8-bit (q8 dtype) for smaller size and faster inference.

For code-specific embeddings, we use jinaai/jina-embeddings-v2-base-code — a model trained specifically on source code that understands programming language semantics better than general-purpose models.

Here's the core of the embedding pipeline:

const pipe = await pipeline('feature-extraction', model.name, {
  dtype: 'q8',
  session_options: {
    enableCpuMemArena: false,
    enableMemPattern: false,
    executionMode: 'sequential',
  },
});

const tensor = await pipe._call(text, {
  pooling: 'cls',
  normalize: true,
});
const vector = Array.from(tensor.data as Float32Array);

Models are registered for lazy loading — the ONNX pipeline isn't created until the first embedding is actually needed. This keeps startup fast and memory usage low when not all graphs are actively queried.

The download cost

The first time you run Graph Memory, it downloads the model weights. For Xenova/bge-m3 at q8 quantization, that's roughly 560 MB. The models are cached in a local directory (configurable via modelsDir in your config), so subsequent starts are fast.

This is the biggest UX friction point. A 560 MB download on first run is noticeable. But it's a one-time cost, and after that the model loads from disk in seconds.

Why not use an API?

We considered using OpenAI's embedding API (text-embedding-3-small or text-embedding-3-large). Here's why we went local:

Privacy. Graph Memory indexes your entire codebase — every function, every doc, every file path. Sending all of that to an external API means your code leaves your machine. For many teams, that's a non-starter. With local embeddings, nothing leaves your machine. Ever.

Cost. OpenAI's text-embedding-3-small costs $0.02 per million tokens. Sounds cheap until you're indexing a large codebase. A project with 10,000 code symbols and 500 doc chunks, each embedded with surrounding context, can easily hit millions of tokens. And you pay again every time you re-index. With local embeddings, the cost is $0 — you're just using your own CPU.

Offline work. Local embeddings work without internet. Index your project on a plane. Search your code graph in a coffee shop with bad wifi. API embeddings fail when the network fails.

Latency consistency. API calls have variable latency — 50ms on a good day, 500ms+ when the service is busy. Local embeddings on a modern CPU take 5-20ms per text after the model is loaded. No cold starts, no rate limits, no retry logic needed.

The trade-offs

Local isn't free. Here's what you give up:

First-load latency. Loading the ONNX model takes a few seconds. The first embedding call pays this cost. We mitigate this with lazy loading — models only load when first needed — and pipeline deduplication, so if two graphs use the same model, they share one pipeline.

CPU usage during indexing. Initial indexing of a large codebase is CPU-intensive. We run indexing in three sequential phases (docs, then files, then code) to avoid loading multiple models simultaneously and keep memory usage predictable.

Model quality. The largest commercial embedding models (like OpenAI's text-embedding-3-large at 3072 dimensions) may produce marginally better embeddings than a quantized open-source model. In practice, we haven't found this to matter for code search — the hybrid search approach (BM25 + vector + graph expansion) compensates for any quality gap in the embeddings alone.

Caching

Every embedding result is cached in an LRU cache (default: 10,000 entries per model). If you search for the same query twice, the second search skips the model entirely.

For production deployments, Graph Memory supports Redis-backed embedding caches. The cache is keyed by a SHA-256 hash of the input text, and vectors are stored as base64-encoded float32 arrays. You can configure TTL per cache:

server:
  redis:
    enabled: true
    url: redis://localhost:6379
    embeddingCacheTtl: 7d

The Redis cache is shared across server restarts, so re-indexing after a restart can skip embeddings that haven't changed.

Remote embeddings as an escape hatch

If you do want API-based embeddings — maybe you need a specific model, or you're running on a machine without enough RAM for ONNX — Graph Memory supports remote embedding endpoints:

server:
  embedding:
    remote: "https://your-embedding-api.com/embed"
    remoteApiKey: "sk-..."
    remoteModel: "text-embedding-3-small"

The remote endpoint receives a POST with { texts: string[] } and returns { embeddings: number[][] }. Retries with exponential backoff are built in for 5xx errors.

This gives you the best of both worlds: local by default, remote when you need it.

The result

For most users, local embeddings are the right default. Your code stays on your machine, indexing costs nothing, and search works offline. The initial model download is a one-time cost that pays for itself immediately.

The hybrid search architecture means embedding quality isn't the whole story anyway — BM25 keyword matching catches what vectors miss, and graph expansion surfaces related nodes that no embedding model would connect. Local embeddings are one piece of a system designed to be greater than the sum of its parts.

How We Use Graph Memory to Develop Graph Memory

2026-03-24T00:00:00.000Z

We build Graph Memory with Graph Memory. Not as a marketing exercise — it's genuinely the fastest way for us to work. Here's a concrete example from a recent development session where we shipped six features in one sitting.

The setup

Graph Memory runs against its own codebase. Claude Code connects to it via MCP. Every conversation has access to the full code graph, docs graph, and — critically — the task and knowledge graphs where we track work and decisions.

The six features we shipped: WebSocket event fixes, sidebar color improvements, WebSocket connection indicator, Pino structured logging, unified filter components, and task grouping in the UI. Here's how Graph Memory's own tools drove the process.

Step 1: Create tasks

We started by breaking work into tasks using tasks_create:

tasks_create({
  title: "Fix WebSocket event broadcasting",
  description: "WS events not reaching all connected clients...",
  priority: "high",
  status: "todo",
  tags: ["bug", "websocket"]
})

Each task got a priority, tags, and a description with enough context for any AI session to pick it up later. Six tasks, six clear scopes.

The tasks immediately appeared as markdown files in .tasks/ — visible in the IDE sidebar, editable in any text editor. This is the file mirror at work: every task, note, and skill has a corresponding markdown file that syncs bidirectionally with the graph.

Step 2: Plan in notes

Before writing code, we captured design decisions as notes:

notes_create({
  title: "Pino logger migration plan",
  content: "Replace console.log/warn/error with Pino structured logging...",
  tags: ["architecture", "decision"]
})

Then linked the note to the relevant task:

tasks_create_link({
  taskId: "pino-logger-migration",
  toId: "pino-logger-migration-plan",
  kind: "planned_by",
  targetGraph: "knowledge"
})

Now the task knows about the plan, and the plan links back to the task. Any AI session that looks at either one finds the other.

Step 3: Implement with full context

When working on the Pino logger task, Claude Code already had context from the task description and the linked planning note. But it also had the code graph — it could search for every console.log call, find the existing logging patterns, and understand the module structure.

The workflow was: pick a task, read its linked notes for decisions, search the code graph for relevant symbols, implement, then update the task.

Step 4: Track progress

As each feature landed, we moved tasks through the kanban:

tasks_move({
  taskId: "fix-websocket-event-broadcasting",
  status: "done"
})

The task graph maintained the full history — when each task was created, when it moved to in_progress, when it was completed. The .tasks/ files updated automatically.

Why this works

Three things make this workflow effective:

Persistent context. Notes and tasks survive across AI sessions. When you start a new conversation, the AI can search for existing decisions instead of re-discovering them. "What did we decide about the logger?" returns the actual planning note.

Cross-graph links. Tasks link to notes (decisions), notes link to code (implementation), code links to docs (explanation). The AI navigates these connections to build complete context for any piece of work.

File mirror. The .tasks/ and .notes/ directories make graph data visible in your IDE. You can scan task status in the file explorer, edit a note in your editor, or review decisions during code review — all without opening the web UI or making API calls. Changes sync back to the graph automatically.

The meta observation

The best test of a developer tool is whether the developers building it actually want to use it. We don't use Graph Memory on our own project because we should — we use it because going back to ad-hoc context management feels broken once you've had structured graph memory.

Every decision is searchable. Every task links to its context. Every AI session starts with full project knowledge instead of a blank slate.

Want to try this workflow on your own project? Get started in under 5 minutes.

Getting Started: From Zero to Semantic Search in 5 Minutes

2026-03-23T00:00:00.000Z

You have a codebase. You want your AI assistant to actually understand it — not just grep through it, but know the structure, remember decisions, and track work. Here's how to get there in 5 minutes.

Minute 1: Install and serve

npm install -g @graphmemory/server
cd /path/to/your-project
graphmemory serve

No config file needed. Graph Memory uses your current directory as the project. On first run it downloads the embedding model (~560 MB, cached after that), then indexes your project in three phases: docs, files, code. The server starts on http://localhost:3000.

You'll see output like:

INFO  Registered model (lazy)         model="Xenova/bge-m3"
INFO  Starting indexing phase         phase="1/3 docs"
INFO  Starting indexing phase         phase="2/3 files"
INFO  Starting indexing phase         phase="3/3 code"
INFO  Indexed docs                    nodes=142 edges=89
INFO  Indexed code                    nodes=387 edges=512
INFO  Indexed files                   nodes=1203 edges=1202

Minute 2: Connect Claude Code

claude mcp add --transport http --scope project graph-memory http://localhost:3000/mcp/your-project

Replace your-project with your directory name. If your project lives at /home/dev/my-app, the project ID is my-app.

For Cursor or Windsurf, add to .mcp.json:

{
  "mcpServers": {
    "graph-memory": {
      "type": "http",
      "url": "http://localhost:3000/mcp/my-app"
    }
  }
}

Your AI assistant now has access to 70 MCP tools across six graphs.

Minute 3: Search your code

Ask your assistant something about your codebase. Behind the scenes, it calls docs_search or code_search:

"How does authentication work in this project?"

Graph Memory returns results from multiple graphs — the auth module's functions and classes from the Code Graph, the authentication docs from the Docs Graph, any related files from the File Index. Results are ranked using hybrid search: BM25 keyword matching plus vector cosine similarity, fused with Reciprocal Rank Fusion.

You can also search directly with specific tools:

"Search the code graph for functions related to token validation"

This calls code_search with your query, returning matching symbols with their signatures, file locations, and relationships.

Minute 4: Create a note

Your assistant can store knowledge that persists across conversations:

"Create a note about our auth architecture: we use JWT tokens with scrypt password hashing, tokens expire after 24 hours, and refresh tokens are stored in HttpOnly cookies"

This calls notes_create with a title and content. The note is automatically embedded for semantic search. Next time any AI session asks about auth, this note shows up in search results.

The note also appears as a markdown file in .notes/ inside your project directory. You can edit it directly in your IDE — changes sync back to the graph automatically.

Minute 5: Link notes to code

Here's where graphs beat flat search. Connect your note to the actual code it describes:

"Link the auth architecture note to the AuthService class in the code graph"

This calls notes_create_link with the note ID, the code symbol ID, and a relation kind like "references". Now when someone searches for the AuthService class, the architecture note surfaces too. When someone reads the note, they can navigate to the code.

You can create links across all six graphs: notes to code symbols, tasks to doc sections, skills to files they modify.

What you have now

After 5 minutes:

Docs Graph — your markdown files parsed into heading-based chunks with cross-file links
Code Graph — AST-parsed functions, classes, imports, and their call relationships
File Index — every file in your project with metadata and directory hierarchy
Knowledge Graph — your notes, searchable and linked to code
Task Graph — ready for kanban workflow with priorities and assignees
Skill Graph — ready to store reusable procedures and recipes

All searchable with hybrid BM25 + vector search. All interconnected through typed edges. All accessible through 70 MCP tools.

Going further

Create a graph-memory.yaml to customize your setup — configure multiple projects, set up workspaces with shared knowledge, enable Redis caching, or add user authentication:

graphmemory serve --config graph-memory.yaml

For the full configuration reference, see the Configuration docs.

Graph Memory vs RAG: Structured Graphs vs Text Chunks

2026-03-22T00:00:00.000Z

If you're building AI-powered developer tools, you've probably considered RAG (Retrieval-Augmented Generation). Graph Memory takes a different approach. Here's how they compare and when to use each.

How RAG works

Traditional RAG splits your codebase into text chunks, embeds them into vectors, and retrieves the most similar chunks for a given query. It's simple, well-understood, and works reasonably well for many use cases.

But it has limitations:

No structure — a function definition is just text, no different from a comment
No relationships — RAG doesn't know that AuthService calls TokenManager
No cross-references — the doc explaining auth and the code implementing it are unrelated chunks
No persistence — you can't store decisions, track tasks, or build team knowledge

How Graph Memory works

Graph Memory builds six typed graphs from your project:

Graph	What it understands
Docs	Heading hierarchy, cross-file links, code blocks
Code	AST symbols, imports, call relationships
Knowledge	Notes, typed relations, cross-graph links
Tasks	Kanban status, priorities, assignees
Skills	Steps, triggers, usage frequency
Files	Directory structure, languages, metadata

Every entity is embedded for vector search, but it's also connected to related entities through typed edges. When you search for "authentication", you don't just get text chunks — you get the auth module's functions, the docs explaining the auth flow, notes about auth decisions, and tasks related to auth work.

Key differences

Aspect	RAG	Graph Memory
Data model	Flat text chunks	Typed nodes + edges in typed graphs
Code understanding	Text similarity	AST-parsed symbols + import graph
Relationships	None	Typed edges (calls, imports, blocks, relates_to)
Search	Vector similarity	Hybrid BM25 + vector + graph expansion
Persistence	Read-only index	Read-write (notes, tasks, skills)
Cross-domain	Separate indices	Cross-graph links (code ↔ docs ↔ notes)

When to use what

Use RAG when:

You need a quick, simple solution for text retrieval
Your content is mostly unstructured prose (blog posts, wikis)
You don't need to track relationships between entities

Use Graph Memory when:

You're working with codebases (structure matters)
You want AI to understand relationships (what calls what, what documents what)
You need persistent team memory (notes, decisions, procedures)
You want task tracking integrated with code context
You want your AI to build and maintain knowledge over time

The best of both worlds

Graph Memory actually includes vector search — every node is embedded and searchable via cosine similarity. But it adds BM25 keyword search and BFS graph expansion on top. You get the recall of RAG plus the precision of structured graphs.

The result: when your AI assistant asks "how does authentication work?", it gets:

The AuthService class and its methods (from Code Graph)
The authentication docs with step-by-step flow (from Docs Graph)
Team decisions about auth architecture (from Knowledge Graph)
Open tasks related to auth improvements (from Task Graph)
The "setup auth for new service" skill (from Skill Graph)

That's something flat RAG simply can't do.

Ready to try it? Get started in under a minute →

Introducing Graph Memory: Semantic Code Memory for AI Assistants

2026-03-21T00:00:00.000Z

We're excited to announce Graph Memory — an MCP server that turns any project directory into a queryable semantic knowledge base for AI assistants.

The problem

AI coding assistants are powerful, but they lose context between conversations. They can't remember decisions your team made, don't know about your project's architecture patterns, and can't track tasks across sessions.

RAG (Retrieval-Augmented Generation) helps, but it treats your codebase as a bag of text chunks. It doesn't understand structure — that this function calls that one, that this doc explains that module, that this task blocks that feature.

The solution: structured graphs

Graph Memory builds six interconnected graphs from your project:

Docs Graph — markdown parsed into heading-based chunks with cross-file links
Code Graph — tree-sitter AST parsing extracts functions, classes, imports, and their relationships
Knowledge Graph — persistent notes and facts with typed relations
Task Graph — kanban workflow with priorities, assignees, and cross-graph context
Skill Graph — reusable recipes and procedures with triggers and usage tracking
File Index — every project file with metadata and directory hierarchy

These graphs are interconnected. A note can link to a code symbol. A task can reference a doc section. A skill can point to the files it modifies. Your AI assistant navigates these connections through 70 MCP tools.

Getting started

npm install -g @graphmemory/server
cd your-project
graphmemory serve

Connect your AI assistant:

# Claude Code
claude mcp add --transport http --scope project graph-memory http://localhost:3000/mcp/your-project

That's it. Your AI assistant now has deep understanding of your codebase.

What's in v1.3

MCP Authentication — secure MCP sessions with API keys
Readonly Mode — protect graphs from mutations while keeping them searchable
AI Prompt Builder — generate optimized system prompts with 14 scenarios, 8 roles, and 6 interaction styles
Connect Dialog — one-click MCP client setup from the Web UI
Hybrid Search — BM25 keyword + vector cosine similarity with graph expansion

Graph Memory Blog

Deploying Graph Memory with Docker

Quick start​

Docker Compose for production​

The config file​

Volume mounts explained​

Model cache​

Project directory access​

Production checklist​

1. Set a JWT secret​

2. Configure users​

3. Enable Redis​

4. Set up a reverse proxy​

Health check​

Graceful shutdown​

Multiple projects​

Other Docker commands​

The Dockerfile​

File Mirror — Edit AI Memory in Your IDE

How it works​

What a mirrored file looks like​

Bidirectional sync​

Editing in your IDE​

Git workflow​

Startup scan​

Conflict resolution​

From 0 to 70 MCP Tools — The Architecture of Graph Memory

The big picture​

Graphology: the storage layer​

tree-sitter WASM: code understanding​

PromiseQueue: mutation serialization​

Hybrid search: BM25 + vector + RRF + BFS​

How a tool call flows​

Key design decisions​

Numbers​

Why We Chose Local Embeddings Over API Calls

How it works​

The download cost​

Why not use an API?​

The trade-offs​

Caching​

Remote embeddings as an escape hatch​

The result​

How We Use Graph Memory to Develop Graph Memory

The setup​

Step 1: Create tasks​

Step 2: Plan in notes​

Step 3: Implement with full context​

Step 4: Track progress​

Why this works​

The meta observation​

Getting Started: From Zero to Semantic Search in 5 Minutes

Minute 1: Install and serve​

Minute 2: Connect Claude Code​

Minute 3: Search your code​

Minute 4: Create a note​

Minute 5: Link notes to code​

What you have now​

Going further​

Graph Memory vs RAG: Structured Graphs vs Text Chunks

How RAG works​

How Graph Memory works​

Key differences​

When to use what​

The best of both worlds​

Introducing Graph Memory: Semantic Code Memory for AI Assistants

The problem​

The solution: structured graphs​

Getting started​

What's in v1.3​

Learn more​

Quick start

Docker Compose for production

The config file

Volume mounts explained

Model cache

Project directory access

Production checklist

1. Set a JWT secret

2. Configure users

3. Enable Redis

4. Set up a reverse proxy

Health check

Graceful shutdown

Multiple projects

Other Docker commands

The Dockerfile

How it works

What a mirrored file looks like

Bidirectional sync

Editing in your IDE

Git workflow

Startup scan

Conflict resolution

The big picture

Graphology: the storage layer

tree-sitter WASM: code understanding

PromiseQueue: mutation serialization

Hybrid search: BM25 + vector + RRF + BFS

How a tool call flows

Key design decisions

Numbers

How it works

The download cost

Why not use an API?

The trade-offs

Caching

Remote embeddings as an escape hatch

The result

The setup

Step 1: Create tasks

Step 2: Plan in notes

Step 3: Implement with full context

Step 4: Track progress

Why this works

The meta observation

Minute 1: Install and serve

Minute 2: Connect Claude Code

Minute 3: Search your code

Minute 4: Create a note

Minute 5: Link notes to code

What you have now

Going further

How RAG works

How Graph Memory works

Key differences

When to use what

The best of both worlds

The problem

The solution: structured graphs

Getting started

What's in v1.3

Learn more