Documentation

Rewind Memory

Persistent, bio-inspired memory for AI agents. Local-first, production-ready. 5-layer architecture (Free) / 7-layer (Pro). Memory type taxonomy, drift detection, recency weighting, and query-intent matching included in all tiers. Ships as a Claude Code plugin or OpenClaw integration.

Quick Start

Install Rewind via pip:

pip install rewind-memory

Then run the doctor to auto-diagnose and build your L0 index, backfill your conversation history, and start the real-time watcher:

rewind doctor          # auto-diagnose, build L0 index, fix config
rewind ingest-chats    # backfill historical OpenClaw conversations
rewind watch           # real-time conversation indexing

Pro users get L5 semantic search automatically when Qdrant is available:

pip install rewind-memory-pro
rewind watch --qdrant-url http://localhost:6333 --embed-url http://localhost:8041/v1/embeddings

That is all that is required to get started.

Architecture

Rewind is structured as seven independent memory layers, each modelled on a distinct region of the memory system. A central orchestrator (L2) handles fusion, ranking, and entity extraction across all layers.

L2 — Orchestrator

Fusion · Ranking · Entity Extraction

Sensory Buffer

Fast keyword recall

SQLite FTS5 + BM25

Short-Term Memory

Recent context

sqlite-vec

Graph Memory

Entity relationships

SQLite / Neo4j

Workspace

Active session context

sqlite-vec

Communications

Chat and email recall

Qdrant

Documents

File and doc search

Qdrant + FTS5

Advanced Features (Pro / Enterprise)

Cloud embeddings · Graph extraction · Cross-encoder reranking

Layer	Name	Purpose	Backend
L0	Sensory Buffer	Fast keyword recall	SQLite FTS5 + BM25
L1	Short-Term Memory	Recent context	sqlite-vec
L2	Orchestrator	Fusion and ranking	In-process
L3	Graph Memory	Entity relationships	SQLite / Neo4j
L4	Workspace	Active session context	sqlite-vec
L5	Communications	Chat and email recall	Qdrant
L6	Documents	File and doc search	Qdrant + FTS5

Tiers

Feature	Free	Pro $18/mo $9/mo (first 1,000)	Enterprise
Real-time conversation watcher	✓ (L0 keyword)	✓ (L0 + L5 semantic)	✓
Historical chat backfill	✓	��	✓
Auto-diagnosis and repair	✓	✓	✓
Multi-channel awareness	—	✓ (Telegram, WhatsApp, Slack, iMessage)	✓
Memory type taxonomy	✓ (user/feedback/project/reference)	✓	✓
Recency weighting	✓ (type-aware decay)	✓	✓
Query-intent matching	✓	✓	✓
Memory drift detection	✓	✓	✓
OpenClaw gateway autopatcher	✓	✓	✓
LLM relevance selection	—	✓ (side-query)	✓
Cross-encoder reranking	—	✓ (GPU)	✓
Memory extraction (post-turn)	—	✓ (auto)	✓
Partial compaction	—	✓	✓
Embedding model	all-MiniLM-L6-v2 (768-dim, local)	NV-Embed-v2 (4096-dim, Modal cloud)	Custom
KG extraction	Heuristic (regex) or Ollama local	Graph-PReFLexOR on Modal T4	Custom LLM
Batch extraction	—	Yes	Yes
Storage	Local SQLite	Local SQLite + Qdrant + Neo4j	Managed
API server	Self-hosted	Self-hosted + cloud relay	Managed
Support	Community	Email	SLA

Upgrade to Pro

Claude Code Plugin Setup

Install

pip install rewind-memory
git clone https://github.com/saraidefence/rewind-memory.git ~/.claude-plugins/rewind-memory

Activate

claude --plugin-dir ~/.claude-plugins/rewind-memory/plugin

Initialise

/rewind-setup

Available Commands

Command	Description
rewind doctor	Auto-diagnose and fix common issues, build L0 index
rewind watch	Real-time session watcher with L0/L5 indexing
rewind ingest-chats	One-time historical conversation backfill
rewind watch-sessions	Real-time conversation capture from OpenClaw sessions
rewind serve	API server with background file watcher
rewind search <query>	Search all memory layers
rewind ingest <path>	Ingest files or directories into memory
rewind remember <text>	Store a manual note in memory
rewind health	Health check across all layers
rewind proxy	Memory-augmented LLM proxy server
rewind bench	Run LoCoMo benchmark
rewind migrate	Migrate backends (Pro)

Pro Setup

Visit saraidefence.com/dashboard or use the CLI to open a Stripe Checkout page:

pip install git+https://github.com/saraidefence/rewind-memory-pro.git

Get Your API Key

After payment completes, the confirmation page displays your key. Copy it immediately — for security it is not stored in plaintext after this page.

rwnd_live_<32 hex chars>

Configure

Add the key to ~/.rewind/config.yaml:

tier: pro
modal:
  auth_token: rwnd_live_<your-key>
embedding:
  provider: modal
  model: nvidia/NV-Embed-v2
  dim: 4096
kg:
  provider: modal
  model: graph-preflexor

Or use the CLI:

rewind config set tier pro
rewind config set modal.auth_token rwnd_live_<your-key>

Re-embed (if upgrading from Free)

If you have existing data, re-embed your chunks through NV-Embed-v2 for 4096-dim vectors:

rewind migrate --reindex

Verify

rewind health

To manage your subscription, visit your dashboard to manage your subscription.

Configuration Reference

Full path: ~/.rewind/config.yaml

# Tier: free | pro | enterprise
tier: free

# Data storage root
data_dir: ~/.rewind/data

embedding:
  provider: local          # local | modal
  model: all-MiniLM-L6-v2  # or nvidia/NV-Embed-v2 for Pro
  dim: 768                 # 768 (free) | 4096 (pro)

kg:
  provider: heuristic      # heuristic | ollama | modal
  model: null              # e.g. saraidefence/graph-preflexor:latest

modal:
  auth_token: null         # rwnd_live_<key> — Pro/Enterprise only

# Optional: Neo4j backend for L3 (enterprise)
neo4j:
  uri: bolt://localhost:7687
  user: neo4j
  password: null

Config Files by Tier

File	Purpose
configs/free.yaml	Default free tier
configs/pro.yaml	Pro cloud settings
configs/enterprise.yaml	Enterprise / self-managed

CLI Reference

rewind serve                   API server + file watcher
rewind init                    Initialise data directory
rewind health                  Check layer status
rewind doctor                  Auto-diagnose and fix issues
rewind ingest <path>           Index files into memory
rewind ingest-chats            Backfill historical conversations
rewind watch                   Watch workspace for file changes
rewind watch-sessions          Real-time conversation capture
rewind search <query>          Search across all layers
rewind recall <query>          Alias for search
rewind remember <text>         Store a manual note
rewind bench                   Run LoCoMo benchmark
rewind config get <key>        Read a config value
rewind config set <key> <val>  Write a config value
rewind migrate --reindex       Re-embed chunks (768 to 4096 for Pro)
rewind export                  Export memory to JSON

Real-Time Conversation Capture

Capture conversations as they happen — no manual backfill needed.watch-sessions uses watchdog to monitor OpenClaw session JSONL files and immediately indexes new turns.

# Watch all OpenClaw session files, index new turns into L0 + L3 + L5
rewind watch-sessions

# Custom session directory
rewind watch-sessions --session-dir /path/to/sessions

# With specific backends
rewind watch-sessions --qdrant-url http://localhost:6333 --embed-url http://localhost:8041/v1/embeddings

Closed-Loop Memory

The pre-turn gateway hook reads memory before each LLM turn.watch-sessionswrites new conversations into memory after each turn. Together they form a closed loop — the agent remembers what it just discussed.

New turns are indexed into L0 (BM25 keyword search), L3 (knowledge graph with entity extraction and co-occurrence edges), and L5 (Qdrant semantic vectors, if available).

Requires: pip install 'watchdog>=3.0'

OpenClaw Integration

Route OpenClaw's memory_searchthrough Rewind's full stack with a single config change. Two integration methods available.

Native Hook (recommended)

Creates a native OpenClaw hook that survives npm updates. No re-apply needed.

# Create the pre-turn memory hook
rewind-openclaw hook

# Verify installation
rewind-openclaw hook --verify

# Remove
rewind-openclaw hook --remove

Gateway Patch (legacy)

Patches the OpenClaw gateway directly. Works but needs re-applying after every npm update.

rewind-openclaw patch
rewind-openclaw patch --verify
rewind-openclaw patch --restore

Config Setup

# Route memory_search through Rewind
rewind-openclaw setup

Both methods fire on every inbound message, query Rewind's HybridRAG proxy, and prepend the top results directly into the message. The agent sees relevant memory before it starts thinking.

Memory Proxy

The memory proxy auto-injects relevant context into every LLM call. No MCP needed — just change your API URL. Works with any OpenAI-compatible tool.

# Ingest your project first
rewind ingest ./my-project/

# Start the memory proxy
rewind proxy --port 8080

# Point your tool at it
OPENAI_BASE_URL=http://localhost:8080/v1 cursor .

Supports OpenAI, Anthropic, NVIDIA, local models, and any OpenAI-compatible API. Use --upstream to change the target provider.

MCP Tools

Rewind ships an MCP server exposing six memory tools. Works with Claude Code, Cursor, Windsurf, and any MCP-compatible client.

Setup

Add to your MCP client config (e.g. ~/.claude/settings.json):

{
  "mcpServers": {
    "rewind": {
      "command": "rewind-mcp"
    }
  }
}

Available Tools

Tool	Description
memory_search	Search across all memory layers with fused ranking
memory_store	Store content into the appropriate layer based on type
memory_extract	Extract structured memories from conversation text
memory_stats	Get layer health and statistics
memory_feedback	Submit retrieval feedback for learning
graph_traverse	Traverse the knowledge graph with spreading activation

Self-Hosted / Docker

git clone https://github.com/saraidefence/rewind-memory.git
cd rewind-memory
docker compose -f docker/docker-compose.yml up -d

The API server starts on http://localhost:8080.

Environment Variables

STRIPE_SECRET_KEY=sk_live_...
STRIPE_WEBHOOK_SECRET=whsec_...
STRIPE_PRO_PRICE_ID=price_...
REWIND_BASE_URL=https://your-domain.com
REWIND_DATA_DIR=/data

Stripe Webhook

POST https://your-domain.com/stripe/webhook

Enable these events:

checkout.session.completed
customer.subscription.deleted
invoice.payment_succeeded

API Endpoints

Cloud services run on Modal. All endpoints listed below are Pro / Enterprise only.

API (core)https://saraidefence--rewind-api-api.modal.run

Checkouthttps://saraidefence--rewind-api-api.modal.run/checkout/pro

Embeddinghttps://saraidefence--rewind-embedding-api-embeddingmodel-api.modal.run

KG Extracthttps://saraidefence--rewind-graph-reranker-api-graphrerankermod-dbb835.modal.run

Batch Extracthttps://saraidefence--rewind-graph-reranker-api-graphrerankermod-09d7a1.modal.run

Rerankhttps://saraidefence--rewind-graph-reranker-api-graphrerankermod-f580e1.modal.run

Back to Home View on GitHub