Skip to content

Memory (SSOT + Vektor)

Arkhein's memory protocol is designed to ensure that data is both persistent and fast. We use a dual-layer approach.

SQLite as the SSOT

The Single Source of Truth (SSOT) for all domain knowledge (content, metadata, and embeddings) is our SQLite database.

  • Persistence: All knowledge is written to SQLite on the nativephp connection before any other operation.
  • Recovery: If our vector index is deleted or corrupted, it can be completely rebuilt from the data stored in SQLite.

Vektor as Disposable Acceleration

While SQLite is the source of truth, it's not optimized for high-speed vector retrieval. We use Vektor's binary index for acceleration.

  • Disposability: The Vektor binary index is disposable. We do not store any unique data in it.
  • Self-healing: If MemoryService detects that the Vektor index is missing, empty, or has a dimension mismatch, it will automatically trigger a rebuild from SQLite.

Sovereign Tree (Hierarchical RAG)

Arkhein utilizes a hierarchical semantic indexing model (Sovereign Tree) to ensure precise, efficient retrieval.

  • Hierarchy: Canopy (Silo Level) > Vessel (Document Level) > Fragment (Chunk Level).
  • Canopy Discovery: A pre-recall step identifying the relevant silos to search, reducing the search space to O(log n).
  • Silo Manifest: Ground-truth file lists for 100% structural accuracy during retrieval.
  • Adaptive Limits: Dynamically adjusting fragment limits based on query intent and complexity.

Performance Optimizations

To maintain a responsive desktop experience, Arkhein employs several performance protocols:

  • Deterministic Caching: MD5-hashed caching of LLM completions and embeddings.
  • Shadow Rebuilds: Zero-downtime indexing using shadow directories and atomic swapping.
  • Incremental Indexing: Threshold-based logic prioritizing live insertions for small changes.
  • Re-entrant Locking: Partition-isolated locking with exponential backoff for high-concurrency safety.

Invariants

  • Sync Rule: knowledge table updates first, then Vektor is updated/rebuilt.
  • Dimensional Integrity: Vektor must always match the dimensions of the configured embedding model.
  • Serialization: Indexing and Vektor rebuilds are serialized via job-level and service-level locks.
  • Threshold-based Sync: Avoid full rebuilds for small updates to maintain zero-latency search.