OpenSearch vs LanceDB for Vector Search: Query Cost and Infrastructure

Choosing a vector database usually comes down to a tradeoff between a managed search service and an embedded library. OpenSearch and LanceDB sit on opposite ends of that spectrum: one runs as a distributed cluster with a rich feature set (full-text search, security, aggregations, multi-tenancy), the other as a columnar file format you query directly from your application. Both are good at vector search. This post sets ingestion aside and focuses on the steady-state question that dominates the bill once data is loaded: what does it cost to run queries, and what infrastructure do you need to keep running?

The workload is the same on both sides: 287,360 images from the COCO 2017 dataset, embedded with Google’s SigLIP 2 (SoViT-400M, 384px) into 1152-dimensional, L2-normalized vectors. From there, costs are projected to 1M, 10M, and 100M documents.

The Setup

Metric/ComponentValue
DatasetCOCO 2017 (all splits)
Images287,360
Embedding modelgoogle/siglip2-so400m-patch14-384
Vector dimensions1152
NormalizationL2 (unit vectors)
Average image size~160 KB JPEG

Both systems use the same vectors and the same image set. The difference is where each piece lives, not what it is.

Storage Architecture

The two systems split bytes differently between expensive (RAM/EBS attached to a search node) and cheap (object storage) tiers.

OpenSearch: Vectors on the Cluster, Images in S3

OpenSearch cluster
├── HNSW index (Lucene segments, RAM + EBS)
└── document fields: image_id, caption, s3_uri ──┐
S3 bucket (separate)                             │
└── 287,360 JPEG files (~46 GB) ◄────────────────┘

OpenSearch documents store the vector, the metadata fields, and an s3_uri (or path) pointing at the image. The Lucene HNSW graph and the vectors themselves live on the search node (partly in RAM, partly on EBS) and the application fetches the image from S3 after the search returns the URI. This is a clean split: the search node handles ranking, S3 handles bulk storage. It does mean two systems to operate (search cluster + bucket policy), but image bytes never touch the cluster’s RAM, EBS, or replication pipeline.

LanceDB: Everything in Lance Format

S3 bucket
└── coco_clip_embeddings.lance/
    ├── vectors (1152-dim float32, optionally SQ8)
    ├── metadata (image_id, caption, etc.)
    └── image_bytes (raw JPEG, lazily read)

LanceDB stores vectors, metadata, and image bytes together as columns in Lance files on S3. Lance is columnar, so a nearest-neighbor search reads only the vector and metadata columns; the image_bytes column is fetched lazily, by row, when the application accesses it. The index is built and persisted alongside the data, in the same S3 prefix.

What’s Equivalent

Image storage cost is essentially the same in both designs: ~160 KB JPEGs sitting in S3 Standard at $0.023/GB/month. What differs is what runs on the always-on tier: OpenSearch keeps vectors and the HNSW graph hot on a search node; LanceDB pulls index pages from S3 into a memory-mapped cache on demand. That distinction is what drives the cost curves below.

Query Results

Both systems return the same top result for a query using the first image embedding (a man on a moped):

OpenSearch:

Rank   Score      Image ID     Caption
1      1.0000     391895       A man with a red helmet on a small moped on a di...
2      0.9064     252839       cattle grazing on grass along the side of a road...
3      0.9033     253446
4      0.8949     490582       A man and a woman on a motorcycle in helmets.
5      0.8941     550859

LanceDB:

Rank   Distance   Image ID     Caption
1      0.0000     391895       A man with a red helmet on a small moped on a di...
2      0.4941     580784
3      0.4995     579451
4      0.5030     169633       there is a man riding a bike and waving
5      0.5132     191824

OpenSearch reports cosine similarity (higher is better), LanceDB reports cosine distance (lower is better). Both retrieved the exact match at rank 1. The remaining results differ because OpenSearch uses Lucene’s HNSW with default parameters while LanceDB uses IVF_HNSW_SQ with scalar quantization; different approximate-nearest-neighbor structures will diverge past the exact match. Recall@10 against an exact baseline is comparable on this dataset (both well above 0.95) once each index is tuned.

Latency in single-client testing was sub-50 ms p95 on both systems for a top-10 query at 287K vectors. At higher QPS and larger corpora, the limiting factor shifts: OpenSearch is bounded by node CPU and HNSW graph traversal in RAM, LanceDB by S3 latency for cold pages and by partition fan-out for IVF.

AWS Cost Comparison

The numbers below cover steady-state query infrastructure only, that is, the always-on cost of keeping the index queryable, not one-time ingestion or backfill. Pricing is us-east-1 on-demand at the time of writing.

A few shared assumptions to make this fair:

Cost Model

Every line in the tables below comes out of the same handful of formulas. Let:

N       = number of documents, expressed in millions  (e.g. 100 for 100M)
d       = vector dimensions  (1152 here)
b       = bytes per vector element  (4 for float32, 1 for SQ8)
M       = HNSW graph degree  (Lucene default 16; LanceDB uses 32)
img_KB  = average image size in KB  (~160)
QPS     = sustained queries per second

A useful identity to keep in mind: 1 GB ≈ 1 billion bytes ≈ 1 million KB. That’s what lets the formulas below land in GB without scientific notation.

Vector data. Each vector takes d × b bytes; multiplied by N million docs gives gigabytes:

vector_GB = N × d × b / 1000

For 100M docs at d=1152, SQ8: 100 × 1152 × 1 / 1000 = 115.2 GB. Raw float32 would be 4× that = 460.8 GB.

HNSW graph memory. Each node holds M edges as 4-byte ints in the bottom layer, with a small fraction of nodes appearing on upper layers. A practical upper bound:

hnsw_GB ≈ N × M × 4 × 1.05 / 1000      # the 1.05 covers upper layers

For 100M docs, M=16: 100 × 16 × 4 × 1.05 / 1000 ≈ 6.7 GB.

OpenSearch node RAM sizing. AWS’s published guidance for the k-NN plugin is that the in-memory portion (vectors + graph) should occupy roughly half of node RAM, leaving the rest for the JVM heap, segment cache, and OS:

required_RAM_GB = 2 × (vector_GB + hnsw_GB)

For 100M, SQ8, M=16: 2 × (115.2 + 6.7) ≈ 244 GB. The smallest Amazon OpenSearch Service instance that comfortably holds that with headroom for merges and snapshots is r6g.12xlarge.search at 384 GB.

EBS for OpenSearch index segments. Lucene segment files on disk are roughly the same size as the in-memory index, plus headroom for merges (a 2× rule of thumb is standard) and a small per-document metadata footprint (call it 500 bytes for image_id, s3_uri, caption). With 500 bytes/doc, metadata weighs N × 0.5 GB for N million docs:

index_disk_GB = 2 × (vector_GB + hnsw_GB + N × 0.5)

For 100M: 2 × (115.2 + 6.7 + 50) ≈ 344 GB. At gp3’s $0.08/GB-month that’s about $28/month.

S3 image storage. Same on both systems. With image size in KB and N in millions, image storage in GB is just the product:

image_storage_GB = N × img_KB
S3_image_cost    = image_storage_GB × $0.023/GB-month

For 100M docs at 160 KB/image: 100 × 160 = 16,000 GB, so 16,000 × $0.023 ≈ $368/month.

S3 storage for LanceDB vectors + metadata. LanceDB persists the quantized vectors and metadata columns alongside the images in the same Lance dataset, so this just adds the non-image bytes to S3:

lancedb_index_GB = vector_GB + N × 0.5
S3_index_cost    = lancedb_index_GB × $0.023/GB-month

For 100M, SQ8: 115.2 + 50 ≈ 165 GB → $3.80/month.

Compute, monthly. AWS bills hourly; one month ≈ 730 hours:

compute_per_month = hourly_price × 730

So r6g.12xlarge.search at $4.024/hr is 4.024 × 730 ≈ $2,937/month; c6g.4xlarge at $0.544/hr is 0.544 × 730 ≈ $397/month.

S3 GET costs for LanceDB queries. S3 Standard charges $0.0004 per 1,000 GET requests in us-east-1, which is the same as $0.40 per million; both forms appear in AWS documentation. Assume each query reads, on average, one coalesced range (Lance batches partition reads). With 86,400 seconds/day × 30 days/month ≈ 2.6 million seconds/month:

gets_per_month_M = QPS × 2.6        # in millions of GETs
S3_get_cost      = gets_per_month_M × $0.40        # i.e. gets × $0.0004 / 1,000

At 10 QPS: 10 × 2.6 ≈ 26 million requests, so 26 × $0.40 ≈ $10.40/month. Linear in QPS, independent of corpus size.

Worked Example: 100M Documents

Plugging the formulas in for the 100M tier with d=1152, b=1 (SQ8), M=16, img=160 KB, QPS=10:

vector_GB         = 100 × 1152 × 1 / 1000        = 115.2 GB
hnsw_GB           = 100 × 16 × 4 × 1.05 / 1000   =   6.7 GB
required_RAM_GB   = 2 × (115.2 + 6.7)            = 243.8 GB  → r6g.12xlarge.search (384 GB)
metadata_GB       = 100 × 0.5                    =    50 GB
index_disk_GB     = 2 × (115.2 + 6.7 + 50)       =   344 GB

OpenSearch compute = $4.024/hr × 730 hr          = $2,937/mo
OpenSearch EBS     = 344 GB × $0.08/GB-mo        =    $28/mo
S3 images          = 100 × 160 = 16,000 GB × $0.023  =  $368/mo
OpenSearch total                                 ≈ $3,333/mo

LanceDB compute    = $0.544/hr × 730 hr          =   $397/mo
LanceDB index S3   = (115.2 + 50) GB × $0.023    =     $4/mo
S3 images          = 16,000 GB × $0.023          =   $368/mo
S3 GETs @ 10 QPS   = 26 million × $0.40/M        =    $10/mo
LanceDB total                                    ≈   $779/mo

The same formulas drive every row in the tables below, only N changes.

287K Documents (~46 GB images)

ComponentOpenSearchLanceDB
Instancer6g.large.search (16 GB, 2 vCPU)c6g.medium (2 GB, 1 vCPU)
Compute (730 hr)$122/mo$25/mo
EBS / index storage~$2/mo (gp3, 20 GB)$0 (in S3 below)
S3 (vectors+metadata)n/a<$1/mo
S3 (images, ~46 GB)~$1/mo~$1/mo
S3 GETs @ 10 QPSn/a~$10/mo
Total~$125/mo~$37/mo

At this scale OpenSearch fits comfortably on the smallest managed node. LanceDB runs on a tiny compute-optimized instance because the working set is well under 1 GB.

1M Documents (~160 GB images)

ComponentOpenSearchLanceDB
Instancer6g.xlarge.search (32 GB, 4 vCPU)c6g.large (4 GB, 2 vCPU)
Compute (730 hr)$245/mo$50/mo
EBS / index storage~$5/mo (gp3, 60 GB)$0
S3 (vectors+metadata)n/a~$1/mo
S3 (images, ~160 GB)~$4/mo~$4/mo
S3 GETs @ 10 QPSn/a~$10/mo
Total~$254/mo~$65/mo

10M Documents (~1.6 TB images)

ComponentOpenSearchLanceDB
Instancer6g.4xlarge.search (128 GB, 16 vCPU)c6g.xlarge (8 GB, 4 vCPU)
Compute (730 hr)$980/mo$100/mo
EBS / index storage~$3/mo (gp3, 35 GB)$0
S3 (vectors+metadata)n/a~$1/mo
S3 (images, ~1.6 TB)~$37/mo~$37/mo
S3 GETs @ 10 QPSn/a~$10/mo
Total~$1,020/mo~$148/mo

At 10M with the formulas above: vector_bytes = 11.5 GB, hnsw_bytes ≈ 0.7 GB, so required_RAM ≈ 24 GB, the r6g.4xlarge.search (128 GB) has comfortable headroom. LanceDB’s working set during a typical IVF probe is a small fraction of the index, so a c6g.xlarge with a memory-mapped cache is enough.

100M Documents (~16 TB images)

ComponentOpenSearchLanceDB
Instancer6g.12xlarge.search (384 GB, 48 vCPU)c6g.4xlarge (32 GB, 16 vCPU)
Compute (730 hr)$2,937/mo$397/mo
EBS / index storage~$28/mo (gp3, 344 GB)$0
S3 (vectors+metadata)n/a~$4/mo (SQ8)
S3 (images, ~16 TB)~$368/mo~$368/mo
S3 GETs @ 10 QPSn/a~$10/mo
Total~$3,333/mo~$779/mo

These are the numbers from the worked example above. Even with a wide compute gap, image storage in S3 is the same on both sides because the image strategy is identical (the bytes live in S3 either way). The compute delta is what’s left after equalizing storage.

What’s Driving the Curves

Three forces are at play, and each cuts in a different direction:

  1. OpenSearch compute scales with the in-memory index size. Even with quantization, the HNSW graph + quantized vectors must fit on a node’s RAM to hit single-digit-millisecond latencies. Crossing a node-size boundary roughly doubles the compute line.
  2. LanceDB compute scales with QPS, not corpus size. Index pages come from S3 and are cached in RAM as queries touch them. A larger corpus means more cold-page misses, but the steady-state hot set is governed by query mix. The tradeoff is per-query S3 GET costs that grow linearly with traffic.
  3. Image storage is a wash. Both designs put the bulk bytes in S3. The image line is identical at every scale; the gap above is entirely about where the index lives.

OpenSearch can narrow the compute gap further with binary quantization (32× memory reduction) or by moving cold partitions to disk-based ANN, at the cost of recall and tail latency. LanceDB can absorb higher QPS by adding read replicas (each is just another small EC2 reading the same S3 prefix) or by enabling an SSD-backed cache to cut S3 GETs. Both have levers; the table above uses the most common configuration on each side.

Index Configuration

OpenSearch (Lucene HNSW, SQ8)

"settings": {
    "index": {
        "knn": True,
        "knn.algo_param.ef_search": 100,
    }
},
"mappings": {
    "properties": {
        "embedding": {
            "type": "knn_vector",
            "dimension": 1152,
            "method": {
                "name": "hnsw",
                "space_type": "cosinesimil",
                "engine": "lucene",
                "parameters": {"encoder": {"name": "sq"}},
            },
        },
        "s3_uri":   {"type": "keyword"},
        "image_id": {"type": "keyword"},
        "caption":  {"type": "text"},
    }
}

LanceDB (IVF_HNSW_SQ)

num_partitions  = 1 if num_rows < 1_000_000 else int(math.sqrt(num_rows))
m               = 32 if num_rows > 100_000 else 20
ef_construction = 400 if num_rows > 500_000 else 300

table.create_index(
    metric="cosine",
    vector_column_name="vector",
    index_type="IVF_HNSW_SQ",
    num_partitions=num_partitions,
    m=m,
    ef_construction=ef_construction,
)

Both indexes use 8-bit scalar quantization on the vectors and HNSW for the graph traversal. LanceDB layers an IVF partitioning step on top, which is what lets it touch a small fraction of the index per query at large corpora.

Operational Complexity

ConcernOpenSearchLanceDB
RuntimeManaged cluster (or JVM in your container)Embedded library / sidecar process
DependenciesOpenSearch domain, IAM, VPCpip install lancedb, S3 bucket
Other features availableFull-text, BM25, aggregations, security, multi-tenancyVector + columnar scans only
Scaling outAdd data nodes, rebalance shardsAdd read replicas reading the same S3 prefix
Image servingApplication reads from S3 by URIReturned in query results, lazily fetched
BackupSnapshot API to S3Lance files already in S3
Can scale to zeroNo, domain runs 24/7Yes, queryable from cold S3

OpenSearch’s higher base cost buys real capabilities ( full-text relevance, RBAC, aggregations, multi-tenancy) that LanceDB simply doesn’t have. If you need those, the comparison stops being apples-to-apples.

When to Use Which

Choose OpenSearch when:

Choose LanceDB when:

Summary

  1. Image storage is a wash. Both designs put image bytes in S3 at $0.023/GB/month; that line is identical at every scale.
  2. Where the index lives drives the cost. OpenSearch keeps vectors + HNSW graph hot on a search node; LanceDB serves them from memory-mapped Lance files on S3.
  3. OpenSearch compute scales with index RAM. With SQ8 quantization, the curve is gentler than raw float32, but crossing node-size boundaries still roughly doubles the bill.
  4. LanceDB compute scales with QPS. Steady-state cost is dominated by a small compute instance plus S3 GETs that grow with traffic, not corpus size.
  5. Feature breadth is part of the price. OpenSearch’s higher base cost buys full-text, security, and aggregations; if you need those, the gap shrinks.
  6. At 100M docs, the worked example shows ~$3,333/mo for OpenSearch vs ~$779/mo for LanceDB on equivalent SQ8 indexes, about 4.3x, with image storage identical on both sides ($368/mo).

The numbers above are a worked example, not a universal claim. Different recall targets, latency SLOs, redundancy requirements, or feature needs (full-text, RBAC) will move both lines. The point is to compare like with like (same quantization, same image storage strategy) and surface the real driver: where the index lives.

All code, benchmarks, and the cost estimator are available at opensearch-lancedb-migration.

The dataset is available on Hugging Face here: jrmiller/coco-2017-siglip2-embeddings