Documentation
Rewind Memory
Persistent, bio-inspired memory for AI agents. Local-first, production-ready. 5-layer architecture (Free) / 7-layer (Pro). Memory type taxonomy, drift detection, recency weighting, and query-intent matching included in all tiers. Ships as a Claude Code plugin or OpenClaw integration.
Quick Start
Install Rewind via pip:
pip install rewind-memoryThen run the doctor to auto-diagnose and build your L0 index, backfill your conversation history, and start the real-time watcher:
rewind doctor # auto-diagnose, build L0 index, fix config
rewind ingest-chats # backfill historical OpenClaw conversations
rewind watch # real-time conversation indexingPro users get L5 semantic search automatically when Qdrant is available:
pip install rewind-memory-pro
rewind watch --qdrant-url http://localhost:6333 --embed-url http://localhost:8041/v1/embeddingsThat is all that is required to get started.
Architecture
Rewind is structured as seven independent memory layers, each modelled on a distinct region of the memory system. A central orchestrator (L2) handles fusion, ranking, and entity extraction across all layers.
Fusion · Ranking · Entity Extraction
Cloud embeddings · Graph extraction · Cross-encoder reranking
| Layer | Name | Backend |
|---|---|---|
| L0 | Sensory Buffer | SQLite FTS5 + BM25 |
| L1 | Short-Term Memory | sqlite-vec |
| L2 | Orchestrator | In-process |
| L3 | Graph Memory | SQLite / Neo4j |
| L4 | Workspace | sqlite-vec |
| L5 | Communications | Qdrant |
| L6 | Documents | Qdrant + FTS5 |
Tiers
| Feature | Free | Pro $18/mo $9/mo (first 1,000) | Enterprise |
|---|---|---|---|
| Real-time conversation watcher | ✓ (L0 keyword) | ✓ (L0 + L5 semantic) | ✓ |
| Historical chat backfill | ✓ | �� | ✓ |
| Auto-diagnosis and repair | ✓ | ✓ | ✓ |
| Multi-channel awareness | — | ✓ (Telegram, WhatsApp, Slack, iMessage) | ✓ |
| Memory type taxonomy | ✓ (user/feedback/project/reference) | ✓ | ✓ |
| Recency weighting | ✓ (type-aware decay) | ✓ | ✓ |
| Query-intent matching | ✓ | ✓ | ✓ |
| Memory drift detection | ✓ | ✓ | ✓ |
| OpenClaw gateway autopatcher | ✓ | ✓ | ✓ |
| LLM relevance selection | — | ✓ (side-query) | ✓ |
| Cross-encoder reranking | — | ✓ (GPU) | ✓ |
| Memory extraction (post-turn) | — | ✓ (auto) | ✓ |
| Partial compaction | — | ✓ | ✓ |
| Embedding model | all-MiniLM-L6-v2 (768-dim, local) | NV-Embed-v2 (4096-dim, Modal cloud) | Custom |
| KG extraction | Heuristic (regex) or Ollama local | Graph-PReFLexOR on Modal T4 | Custom LLM |
| Batch extraction | — | Yes | Yes |
| Storage | Local SQLite | Local SQLite + Qdrant + Neo4j | Managed |
| API server | Self-hosted | Self-hosted + cloud relay | Managed |
| Support | Community | SLA |
Claude Code Plugin Setup
Install
pip install rewind-memory
git clone https://github.com/saraidefence/rewind-memory.git ~/.claude-plugins/rewind-memoryActivate
claude --plugin-dir ~/.claude-plugins/rewind-memory/pluginInitialise
/rewind-setupAvailable Commands
| Command | Description |
|---|---|
| rewind doctor | Auto-diagnose and fix common issues, build L0 index |
| rewind watch | Real-time session watcher with L0/L5 indexing |
| rewind ingest-chats | One-time historical conversation backfill |
| rewind watch-sessions | Real-time conversation capture from OpenClaw sessions |
| rewind serve | API server with background file watcher |
| rewind search <query> | Search all memory layers |
| rewind ingest <path> | Ingest files or directories into memory |
| rewind remember <text> | Store a manual note in memory |
| rewind health | Health check across all layers |
| rewind proxy | Memory-augmented LLM proxy server |
| rewind bench | Run LoCoMo benchmark |
| rewind migrate | Migrate backends (Pro) |
Pro Setup
Subscribe
Visit saraidefence.com/dashboard or use the CLI to open a Stripe Checkout page:
pip install git+https://github.com/saraidefence/rewind-memory-pro.gitGet Your API Key
After payment completes, the confirmation page displays your key. Copy it immediately — for security it is not stored in plaintext after this page.
rwnd_live_<32 hex chars>Configure
Add the key to ~/.rewind/config.yaml:
tier: pro
modal:
auth_token: rwnd_live_<your-key>
embedding:
provider: modal
model: nvidia/NV-Embed-v2
dim: 4096
kg:
provider: modal
model: graph-preflexorOr use the CLI:
rewind config set tier pro
rewind config set modal.auth_token rwnd_live_<your-key>Re-embed (if upgrading from Free)
If you have existing data, re-embed your chunks through NV-Embed-v2 for 4096-dim vectors:
rewind migrate --reindexConfiguration Reference
Full path: ~/.rewind/config.yaml
# Tier: free | pro | enterprise
tier: free
# Data storage root
data_dir: ~/.rewind/data
embedding:
provider: local # local | modal
model: all-MiniLM-L6-v2 # or nvidia/NV-Embed-v2 for Pro
dim: 768 # 768 (free) | 4096 (pro)
kg:
provider: heuristic # heuristic | ollama | modal
model: null # e.g. saraidefence/graph-preflexor:latest
modal:
auth_token: null # rwnd_live_<key> — Pro/Enterprise only
# Optional: Neo4j backend for L3 (enterprise)
neo4j:
uri: bolt://localhost:7687
user: neo4j
password: nullConfig Files by Tier
| File | Purpose |
|---|---|
| configs/free.yaml | Default free tier |
| configs/pro.yaml | Pro cloud settings |
| configs/enterprise.yaml | Enterprise / self-managed |
CLI Reference
rewind serve API server + file watcher
rewind init Initialise data directory
rewind health Check layer status
rewind doctor Auto-diagnose and fix issues
rewind ingest <path> Index files into memory
rewind ingest-chats Backfill historical conversations
rewind watch Watch workspace for file changes
rewind watch-sessions Real-time conversation capture
rewind search <query> Search across all layers
rewind recall <query> Alias for search
rewind remember <text> Store a manual note
rewind bench Run LoCoMo benchmark
rewind config get <key> Read a config value
rewind config set <key> <val> Write a config value
rewind migrate --reindex Re-embed chunks (768 to 4096 for Pro)
rewind export Export memory to JSONReal-Time Conversation Capture
Capture conversations as they happen — no manual backfill needed.watch-sessions uses watchdog to monitor OpenClaw session JSONL files and immediately indexes new turns.
# Watch all OpenClaw session files, index new turns into L0 + L3 + L5
rewind watch-sessions
# Custom session directory
rewind watch-sessions --session-dir /path/to/sessions
# With specific backends
rewind watch-sessions --qdrant-url http://localhost:6333 --embed-url http://localhost:8041/v1/embeddingsClosed-Loop Memory
The pre-turn gateway hook reads memory before each LLM turn.watch-sessionswrites new conversations into memory after each turn. Together they form a closed loop — the agent remembers what it just discussed.
New turns are indexed into L0 (BM25 keyword search), L3 (knowledge graph with entity extraction and co-occurrence edges), and L5 (Qdrant semantic vectors, if available).
Requires: pip install 'watchdog>=3.0'
OpenClaw Integration
Route OpenClaw's memory_searchthrough Rewind's full stack with a single config change. Two integration methods available.
Native Hook (recommended)
Creates a native OpenClaw hook that survives npm updates. No re-apply needed.
# Create the pre-turn memory hook
rewind-openclaw hook
# Verify installation
rewind-openclaw hook --verify
# Remove
rewind-openclaw hook --removeGateway Patch (legacy)
Patches the OpenClaw gateway directly. Works but needs re-applying after every npm update.
rewind-openclaw patch
rewind-openclaw patch --verify
rewind-openclaw patch --restoreConfig Setup
# Route memory_search through Rewind
rewind-openclaw setupBoth methods fire on every inbound message, query Rewind's HybridRAG proxy, and prepend the top results directly into the message. The agent sees relevant memory before it starts thinking.
Memory Proxy
The memory proxy auto-injects relevant context into every LLM call. No MCP needed — just change your API URL. Works with any OpenAI-compatible tool.
# Ingest your project first
rewind ingest ./my-project/
# Start the memory proxy
rewind proxy --port 8080
# Point your tool at it
OPENAI_BASE_URL=http://localhost:8080/v1 cursor .Supports OpenAI, Anthropic, NVIDIA, local models, and any OpenAI-compatible API. Use --upstream to change the target provider.
MCP Tools
Rewind ships an MCP server exposing six memory tools. Works with Claude Code, Cursor, Windsurf, and any MCP-compatible client.
Setup
Add to your MCP client config (e.g. ~/.claude/settings.json):
{
"mcpServers": {
"rewind": {
"command": "rewind-mcp"
}
}
}Available Tools
| Tool | Description |
|---|---|
| memory_search | Search across all memory layers with fused ranking |
| memory_store | Store content into the appropriate layer based on type |
| memory_extract | Extract structured memories from conversation text |
| memory_stats | Get layer health and statistics |
| memory_feedback | Submit retrieval feedback for learning |
| graph_traverse | Traverse the knowledge graph with spreading activation |
Self-Hosted / Docker
git clone https://github.com/saraidefence/rewind-memory.git
cd rewind-memory
docker compose -f docker/docker-compose.yml up -dThe API server starts on http://localhost:8080.
Environment Variables
STRIPE_SECRET_KEY=sk_live_...
STRIPE_WEBHOOK_SECRET=whsec_...
STRIPE_PRO_PRICE_ID=price_...
REWIND_BASE_URL=https://your-domain.com
REWIND_DATA_DIR=/dataStripe Webhook
Register the following endpoint in your Stripe dashboard:
POST https://your-domain.com/stripe/webhookEnable these events:
checkout.session.completedcustomer.subscription.deletedinvoice.payment_succeeded
API Endpoints
Cloud services run on Modal. All endpoints listed below are Pro / Enterprise only.