A disk-oriented vector database for approximate nearest-neighbor search, managing HNSW graph index on-disk with graph-oriented LSM-tree storage.
Key differences from existing vector databases.
Unlike many vector databases that keep large index state in memory, LSM-Vec is fully disk-oriented. Its memory footprint remains small and predictable even at large data scale.
LSM-Vec stores the majority of the HNSW index within Aster, a RocksDB fork with a graph data model. This graph-oriented LSM-tree structure enables search and update performance comparable to in-memory vector databases.
LSM-Vec is offered as a lightweight C++ library with Python bindings. Build with a few commands, then link the library or import the module to get started.
Everything you need for high-performance vector search on disk.
Disk-oriented design with LSM-tree Based Data Structure
BasicVectorStorage (flat file) and PagedVectorStorage (4KB page-managed with FIFO cache)
Groups neighbor reads by page to reduce I/O during search
Close and reopen the database without re-indexing
L2 and Cosine distance with AVX2/SSE2 acceleration
Full Python bindings via pybind11 with NumPy support