Ukrainian Defence Technology

Autonomous systems for defence.
Intelligent memory for AI.

SARAI Defence builds autonomous defence systems and AI infrastructure. Our first product — Rewind — is the 7-layer memory system born from drone intercept research. Patent pending.

// system status
$rewind doctor && rewind ingest-chats && rewind watch
retrieval: <100ms

The Origin

Born from a drone intercept system.
Now available as Rewind.

SARAI started as the memory backbone for an autonomous drone perception system — built to track, classify, and intercept hostile UAVs in under 400ms. The memory architecture that emerged — seven specialised layers fusing graph traversal, vector similarity, and document retrieval — scored 99.3% on the LoCoMo benchmark, We named it Rewind.

Fuse graph + vector search

<100ms

Combine structured knowledge graphs with semantic vector search in under 100ms.

Learn from corrections

Zero retrain

Retrieval feedback learning adapts to corrections without retraining the model.

Your data never leaves your machine

Local-first

Free and Pro tiers keep all data local. Only embedding vectors transit our API — not stored, not reversible. A key differentiator vs Mem0 and Supermemory.

Edge-native deployment

128GB unified

Runs on any hardware — from a Raspberry Pi to a DGX Spark. Tested on NVIDIA GB10, Mac, Linux, WSL. No cloud dependency.

Architecture

Seven layers. One retrieval call.

Each layer specialises in a different retrieval strategy. The orchestrator fuses results with configurable score weights, L1 override, and trajectory boosting.

// layer manifest
L0
BM25/FTS5 (keyword search)
<1ms
L1
Core Files
<1ms
L2
HybridRAG Orchestrator
~73ms
L3
Knowledge Graph (Neo4j)
~13ms
L4
Vector Search (NV-Embed-v2, 4096-dim)
~62ms
L5
Communications (91K chunks)
<100ms
L6
Document Store (1,179 chunks)
<50ms
7
Layers
25,695
Graph Edges
91K
Chunks
<100ms
Retrieval

Defence Origin

SARAI Perception Loop

The memory architecture was battle-tested inside an autonomous drone intercept pipeline — tracking hostile UAVs, classifying threats, and guiding intercept trajectories in real time.

<400ms
End-to-end latency
Edge
No cloud dependency
Real-time
Autonomous operation

If your memory system can guide a drone to intercept a Shahed-136 at 200km/h, it can probably handle your customer support tickets.

Use Cases

One memory system. Every application.

From conversational AI to autonomous defence systems — Rewind provides the memory layer your application needs.

AI Agents

Persistent, evolving memory for autonomous agents. Recall past actions, learn from mistakes, and maintain context across sessions.

Multi-session recall7-layer fusion

Chatbots

Remember user preferences, past conversations, and domain knowledge. No more "I don't have context on that."

Conversation memoryUser personalisation

RAG Pipelines

Replace naive vector-only retrieval with fused multi-source recall. Graph + vector + keyword in a single call.

Score fusion<100ms retrieval

Defence & Edge AI

Battle-tested on edge hardware including NVIDIA GB10 and RTX workstations. Runs anywhere Python runs.

Edge-native363ms inference

Technical

Not another wrapper.

Rewind is a purpose-built 7-layer memory architecture, not a thin layer over someone else's vector database. 99.3% LoCoMo benchmark. Patent pending.

01

Retrieval Feedback Learning

Adaptive

Scores adapt based on user corrections. The system learns which sources are trustworthy for which queries — without retraining.

02

Trajectory Boost (Bio-inspired LTP)

Bio-inspired

Inspired by long-term potentiation in biological memory. Recent, frequently-accessed knowledge gets boosted while older context decays gracefully.

03

Score Fusion Engine

Configurable

Graph relevance (0.9) + Vector similarity (0.7+) with L1 core file override. Configurable weights per layer, per query type.

04

Edge-Native

Edge

Runs the full 7-layer stack on any local hardware — laptop, workstation, edge device. Tested on NVIDIA GB10, but works anywhere Python and SQLite run. No cloud roundtrip. No data leaves the device.

05

Defence-Grade Testing

Tested

Validated against adversarial retrieval attacks, temporal consistency tests, and latency budgets derived from real-time drone intercept requirements.

06

99.3% LoCoMo Benchmark

Benchmark

Patent-pending architecture combining BM25 keyword search, knowledge graphs, and neural vector retrieval.

Pricing

Start free. Scale when ready.

The core memory stack is free and open source. Pro and MOS tiers unlock advanced capabilities for production workloads.

Rewind Core

Free

5-layer local memory stack with intelligent retrieval. Memory type taxonomy, recency weighting, drift detection, and query-intent matching. No account needed. Zero cost.

Full-text keyword search

BM25/FTS5 across all your files in <1ms

System memory

Auto-loads identity, preferences, and context on every session

Semantic vector search

sqlite-vec embeddings — Core uses local Ollama model, Pro uses cloud H100 GPU

Knowledge graph

Entity relationships, spreading activation, and Hebbian learning

Memory type taxonomy

Every memory classified as user, feedback, project, or reference — <1ms, zero API calls

Recency weighting

Type-aware time decay — project memories fade faster, user preferences persist

Query-intent matching

Asking about ports? Reference-type results get boosted automatically

Memory drift detection

Flags stale references to ports, paths, and services that may have changed

OpenClaw gateway autopatcher

Injects memory into every message before the LLM sees it — zero tool calls needed

basic
HybridRAG score fusion

Fuses keyword, graph, and vector into one ranked answer

LLM relevance selection
Cross-encoder reranking
Memory extraction
Partial compaction
Query orchestrator
Real-time conversation watcher

rewind watch — live session indexing as conversations happen

Historical chat backfill

rewind ingest-chats — one-command backfill of all past conversations

Auto-diagnosis and repair

rewind doctor — self-healing checks and automatic fix of memory stack issues

Communications memory
Multi-channel awareness
L5 semantic search for conversations
NV-Embed-v2 4096-dim embedding integration
Document store
Retrieval feedback learning
Backend migration assist
Priority support
Cloud GPU embeddings
Custom GPU infrastructure
Bio-inspired lifecycle
Air-gapped deployment
Custom graph schemas
Dedicated engineer + SLA
Get Started Free

Rewind Pro

$18$9/mo

Early Adopter — first 1,000 users

Full 7-layer stack with LLM relevance selection, cross-encoder reranking, memory extraction, and partial compaction. 100K cloud GPU embeddings/mo included.

Full-text keyword search

BM25/FTS5 across all your files in <1ms

System memory

Auto-loads identity, preferences, and context on every session

Semantic vector search

sqlite-vec embeddings — Core uses local Ollama model, Pro uses cloud H100 GPU

Knowledge graph

Entity relationships, spreading activation, and Hebbian learning

Memory type taxonomy

Every memory classified as user, feedback, project, or reference — <1ms, zero API calls

Recency weighting

Type-aware time decay — project memories fade faster, user preferences persist

Query-intent matching

Asking about ports? Reference-type results get boosted automatically

Memory drift detection

Flags stale references to ports, paths, and services that may have changed

OpenClaw gateway autopatcher

Injects memory into every message before the LLM sees it — zero tool calls needed

full
HybridRAG score fusion

Fuses all 7 layers — keyword, graph, vector, comms, docs, and feedback signals

LLM relevance selection

Cheap side-query picks the most relevant memories from the full manifest

Cross-encoder reranking

GPU-accelerated reranking for precision retrieval

Memory extraction

Auto-extracts durable memories from conversations after each turn

Partial compaction

Keeps original task intent intact — only compacts the middle of conversations

Query orchestrator

Routes each query to the right layers automatically

Real-time conversation watcher

rewind watch — live session indexing as conversations happen

Historical chat backfill

rewind ingest-chats — one-command backfill of all past conversations

Auto-diagnosis and repair

rewind doctor — self-healing checks and automatic fix of memory stack issues

Communications memory

Index chats, emails, and contacts — recall any conversation

Multi-channel awareness

Telegram, WhatsApp, Slack, iMessage — unified conversation memory across all channels

L5 semantic search for conversations

Qdrant-powered vector search across all indexed conversations — instant, high-precision recall

NV-Embed-v2 4096-dim embedding integration

NVIDIA NV-Embed-v2 embeddings at 4096 dimensions for maximum semantic fidelity

Document store

PDFs, docs, and files with arena-based retrieval

Retrieval feedback learning

System learns from corrections — no retraining needed

Backend migration assist

Upgrade from SQLite to Neo4j/Qdrant locally with one command

Priority support

Direct email access to the team behind Rewind

Cloud GPU embeddings

25K/mo included. A10G-powered, no local GPU required. $0.01/1K overage

Custom GPU infrastructure
Bio-inspired lifecycle
Air-gapped deployment
Custom graph schemas
Dedicated engineer + SLA

Rewind MOS (Memory Operating System)

Custom

Production-grade persistent memory for autonomous agents — on-prem, air-gapped, fully managed.

Full-text keyword search

BM25/FTS5 across all your files in <1ms

System memory

Auto-loads identity, preferences, and context on every session

Semantic vector search

sqlite-vec embeddings — Core uses local Ollama model, Pro uses cloud H100 GPU

Knowledge graph

Entity relationships, spreading activation, and Hebbian learning

Memory type taxonomy

Every memory classified as user, feedback, project, or reference — <1ms, zero API calls

Recency weighting

Type-aware time decay — project memories fade faster, user preferences persist

Query-intent matching

Asking about ports? Reference-type results get boosted automatically

Memory drift detection

Flags stale references to ports, paths, and services that may have changed

OpenClaw gateway autopatcher

Injects memory into every message before the LLM sees it — zero tool calls needed

full
HybridRAG score fusion

Fuses all 7 layers — keyword, graph, vector, comms, docs, and feedback signals

LLM relevance selection

Cheap side-query picks the most relevant memories from the full manifest

Cross-encoder reranking

GPU-accelerated reranking for precision retrieval

Memory extraction

Auto-extracts durable memories from conversations after each turn

Partial compaction

Keeps original task intent intact — only compacts the middle of conversations

Query orchestrator

Routes each query to the right layers automatically

Real-time conversation watcher

rewind watch — live session indexing as conversations happen

Historical chat backfill

rewind ingest-chats — one-command backfill of all past conversations

Auto-diagnosis and repair

rewind doctor — self-healing checks and automatic fix of memory stack issues

Communications memory

Index chats, emails, and contacts — recall any conversation

Multi-channel awareness

Telegram, WhatsApp, Slack, iMessage — unified conversation memory across all channels

L5 semantic search for conversations

Qdrant-powered vector search across all indexed conversations — instant, high-precision recall

NV-Embed-v2 4096-dim embedding integration

NVIDIA NV-Embed-v2 embeddings at 4096 dimensions for maximum semantic fidelity

Document store

PDFs, docs, and files with arena-based retrieval

Retrieval feedback learning

System learns from corrections — no retraining needed

Backend migration assist

Upgrade from SQLite to Neo4j/Qdrant locally with one command

Priority support

Direct email access to the team behind Rewind

Cloud GPU embeddings

25K/mo included. A10G-powered, no local GPU required. Custom volume rates available

Custom GPU infrastructure

Choose your hardware — H100, B200, or bring your own. Custom rates

Bio-inspired lifecycle

Memory decay, crystallisation, pruning, and episodic tagging — like biological memory

Air-gapped deployment

Zero internet. Fully offline. Bring your own embeddings

Custom graph schemas

Define your own entity types, relationships, and domain ontologies

Dedicated engineer + SLA

Named engineer, guaranteed uptime, on-call escalation

Contact Sales

The Vision

Why MOS?

Every LLM on the market is stateless. GPT-4, Claude, Gemini — they forget everything the moment the context window closes. RAG patches this with retrieval, but retrieval isn't memory. It doesn't learn. It doesn't evolve. It doesn't know what matters.

MOS is different. It's a Memory Operating System — seven specialised layers that work like biological memory. Not a database. Not a vector store. A living system that gets smarter the more you use it.

Strengthen

Connections between knowledge grow stronger with repeated co-access — just like synapses in a biological brain.

Decay

Irrelevant knowledge fades gracefully over time. No manual cleanup. The system self-organises.

Crystallise

Frequently accessed patterns solidify into permanent understanding — fast to retrieve, resistant to decay.

Episodic Tagging

Important moments are tagged with context — who, when, where, why. Recalled as experiences, not just data.

This isn't retrieval. It's cognition infrastructure.

Our Mission

50% of revenue goes to Ukraine.

Not a pledge — a structure.

SARAI was born from the need to defend Ukrainian skies. Every subscription, every commercial licence, every enterprise contract — half of the revenue goes directly to Ukrainian defence and humanitarian efforts. This isn't a marketing campaign. It's a founding commitment of SARAI Defence Ltd.

Слава УкраїніGlory to Ukraine