Memory (SSOT + Vektor)

Arkhein's memory protocol is designed to ensure that data is both persistent and fast. We use a dual-layer approach.

SQLite as the SSOT

The Single Source of Truth (SSOT) for all domain knowledge (content, metadata, and embeddings) is our SQLite database.

Persistence: All knowledge is written to SQLite on the nativephp connection before any other operation.
Recovery: If our vector index is deleted or corrupted, it can be completely rebuilt from the data stored in SQLite.

While SQLite is the source of truth, it's not optimized for high-speed vector retrieval. We use Vektor's binary index for acceleration.

Disposability: The Vektor binary index is disposable. We do not store any unique data in it.
Self-healing: If MemoryService detects that the Vektor index is missing, empty, or has a dimension mismatch, it will automatically trigger a rebuild from SQLite.

Arkhein utilizes a hierarchical semantic indexing model (Sovereign Tree) to ensure precise, efficient retrieval.

Hierarchy: Canopy (Silo Level) > Vessel (Document Level) > Fragment (Chunk Level).
Canopy Discovery: A pre-recall step identifying the relevant silos to search, reducing the search space to O(log n).
Silo Manifest: Ground-truth file lists for 100% structural accuracy during retrieval.
Adaptive Limits: Dynamically adjusting fragment limits based on query intent and complexity.

To maintain a responsive desktop experience, Arkhein employs several performance protocols:

Deterministic Caching: MD5-hashed caching of LLM completions and embeddings.
Shadow Rebuilds: Zero-downtime indexing using shadow directories and atomic swapping.
Incremental Indexing: Threshold-based logic prioritizing live insertions for small changes.
Re-entrant Locking: Partition-isolated locking with exponential backoff for high-concurrency safety.

Sync Rule: knowledge table updates first, then Vektor is updated/rebuilt.
Dimensional Integrity: Vektor must always match the dimensions of the configured embedding model.
Serialization: Indexing and Vektor rebuilds are serialized via job-level and service-level locks.
Threshold-based Sync: Avoid full rebuilds for small updates to maintain zero-latency search.