Building an E2E Encrypted Chat Application with LanceDB and libsodium

Building a chat application where the server never sees plaintext messages requires careful coordination between cryptographic primitives, real-time delivery, and persistent storage. This post examines Seal, an end-to-end encrypted chat application that pairs LanceDB for zero-infrastructure storage with libsodium for audited, high-performance cryptography. The goal is to make it as easy as possible to deploy and inexpensive to run through object storage.

Seal Screenshot

Introduction

End-to-end encrypted messaging systems face a fundamental tension: the server must store and relay messages it cannot read. This creates challenges that don’t exist in traditional chat applications:

Without careful design, you end up with decryption failures, message duplication, or architectural complexity that defeats the purpose of a lightweight embedded stack.

Why LanceDB?

The application uses LanceDB as its sole data store, replacing both a traditional relational database and any separate blob storage:

Zero Infrastructure: LanceDB runs embedded — no database server to provision, configure, or maintain. A single lancedb.connect("data/chat.lance") call is the entire setup.

Schema Flexibility: PyArrow schemas provide strong typing with straightforward migration paths when adding columns to existing tables.

Unified Storage: Messages, user records, channel metadata, and encrypted image attachments all live in LanceDB tables, eliminating the need for a separate object store or file system.

Columnar Efficiency: The underlying Lance format stores data column-oriented, making metadata-filtered queries (all messages in a channel, all messages after a timestamp) fast even as tables grow.

Architecture: Encryption Model

The system implements a trust model where the server is treated as an untrusted relay — it stores and forwards ciphertext but never has access to plaintext or private keys:

Browser (libsodium) → Encrypted Payload → FastAPI Server → LanceDB Storage
                                     WebSocket Relay → Other Browsers (libsodium)

Each user generates an X25519 key pair on registration. The public key is stored server-side; the private key never leaves the browser, persisted only in IndexedDB.

Cryptographic Design: Core Implementation

The encryption layer uses libsodium’s vendored WASM build, providing audited implementations of the cryptographic primitives:

const CRYPTO = {
    /** Generate a new X25519 key pair */
    async generateKeyPair() {
        await this.ensureReady();
        const kp = sodium.crypto_box_keypair();
        return { publicKey: kp.publicKey, privateKey: kp.privateKey };
    },

Key Implementation Details

Ephemeral Key Pairs for Forward Secrecy: Every message generates a fresh X25519 key pair. The ephemeral private key computes a shared secret with the recipient’s public key, encrypts the message, and is then discarded:

async encrypt(plaintext, recipientPublicKey) {
    await this.ensureReady();
    const ephemeral = sodium.crypto_box_keypair();
    const nonce = sodium.randombytes_buf(sodium.crypto_box_NONCEBYTES);
    const message = sodium.from_string(plaintext);
    const ciphertext = sodium.crypto_box_easy(
        message, nonce, recipientPublicKey, ephemeral.privateKey
    );
    return {
        ciphertext: sodium.to_base64(ciphertext, sodium.base64_variants.ORIGINAL),
        iv: sodium.to_base64(nonce, sodium.base64_variants.ORIGINAL),
        ephemeralPublicKey: sodium.to_base64(ephemeral.publicKey, sodium.base64_variants.ORIGINAL),
    };
},

This means compromising a user’s long-term private key does not retroactively expose past messages — each message’s ephemeral key is gone after encryption.

Self-Encrypted Copies for Sent Messages: A subtle problem in E2E chat: the sender encrypts for the recipient’s public key, but then cannot decrypt their own sent messages. The solution is to encrypt a second copy for the sender’s own public key, stored with a channel_id = "self" marker:

async function sendDm(text) {
    const peer = activeConvo.id;
    const recipientPub = peerPublicKeys[peer];

    const encForRecipient = await CRYPTO.encrypt(text, recipientPub);
    const myPub = CRYPTO.importPublicKey(publicKeyB64);
    const encForSelf = await CRYPTO.encrypt(text, myPub);

    ws.send(JSON.stringify({
        type: "dm",
        recipient: peer,
        ciphertext: encForRecipient.ciphertext,
        iv: encForRecipient.iv,
        sender_public_key_jwk: encForRecipient.ephemeralPublicKey,
        self_ciphertext: encForSelf.ciphertext,
        self_iv: encForSelf.iv,
        self_sender_public_key_jwk: encForSelf.ephemeralPublicKey,
    }));
}

The server stores both copies — the recipient’s copy with channel_id = "" and the sender’s copy with channel_id = "self". When loading DM history, both are fetched and sorted by timestamp:

# Sender's self-encrypted copies (channel_id = 'self')
sent_self = messages.search().where(
    f"sender = '{username}' AND recipient = '{peer}' AND channel_id = 'self'{time_filter}",
    prefilter=True
).to_list()
# Messages received from peer
received = messages.search().where(
    f"sender = '{peer}' AND recipient = '{username}' AND channel_id = ''{time_filter}",
    prefilter=True
).to_list()

all_msgs = sorted(sent_self + received, key=lambda m: m["timestamp"])

Hybrid Encryption for Channel Images: Sending an image to a channel with N members requires encrypting it N times if done naively. Instead, the system uses a hybrid approach — encrypt the image once with a random symmetric key (XSalsa20-Poly1305 via crypto_secretbox), then encrypt only that small symmetric key for each member using crypto_box:

async function sendChannelImage() {
    const channelId = activeConvo.id;
    const res = await fetch(`/api/channels/${channelId}/members/public_keys?token=${token}`);
    const memberKeys = await res.json();

    // Pack image into JSON envelope
    const envelope = JSON.stringify({
        type: "image",
        mime: pendingImage.mime,
        data: sodium.to_base64(pendingImage.data, sodium.base64_variants.ORIGINAL),
    });

    // Encrypt envelope once with symmetric key
    const envelopeBytes = sodium.from_string(envelope);
    const sym = await CRYPTO.encryptSymmetric(envelopeBytes);

    // Encrypt the symmetric key for each member
    const keyB64 = sodium.to_base64(sym.key, sodium.base64_variants.ORIGINAL);
    const envelopes = [];
    for (const mk of memberKeys) {
        const pubKey = CRYPTO.importPublicKey(mk.public_key_jwk);
        const encrypted = await CRYPTO.encrypt(keyB64, pubKey);
        envelopes.push({
            target_user: mk.username,
            ciphertext: encrypted.ciphertext,
            iv: encrypted.iv,
            sender_public_key_jwk: encrypted.ephemeralPublicKey,
        });
    }

    ws.send(JSON.stringify({
        type: "channel",
        channel_id: channelId,
        envelopes,
        message_type: "image",
        attachment: { encrypted_data: sym.ciphertext, iv: sym.iv },
    }));
}

This keeps channel image delivery O(N) in symmetric key encryptions (tiny) rather than O(N) in full image encryptions (expensive).

Password-Protected Key Export with Argon2id: To enable device transfer and backup, private keys can be exported encrypted under a user-chosen password. The key derivation uses Argon2id — a memory-hard function that resists both GPU and ASIC attacks:

async encryptPrivateKeyWithPassword(privateKey, password) {
    await this.ensureReady();
    const salt = sodium.randombytes_buf(sodium.crypto_pwhash_SALTBYTES);
    const key = sodium.crypto_pwhash(
        sodium.crypto_secretbox_KEYBYTES,
        password,
        salt,
        sodium.crypto_pwhash_OPSLIMIT_INTERACTIVE,
        sodium.crypto_pwhash_MEMLIMIT_INTERACTIVE,
        sodium.crypto_pwhash_ALG_ARGON2ID13
    );
    const nonce = sodium.randombytes_buf(sodium.crypto_secretbox_NONCEBYTES);
    const encrypted = sodium.crypto_secretbox_easy(privateKey, nonce, key);
    return {
        salt: sodium.to_base64(salt, sodium.base64_variants.ORIGINAL),
        nonce: sodium.to_base64(nonce, sodium.base64_variants.ORIGINAL),
        data: sodium.to_base64(encrypted, sodium.base64_variants.ORIGINAL),
    };
},

The exported file includes the public key in plaintext (it’s public) and the private key encrypted with Argon2id + XSalsa20-Poly1305. Importing on a new device prompts for the password, derives the same key, and decrypts.

LanceDB Storage Layer

The database layer defines five tables with PyArrow schemas, all managed through a single embedded LanceDB connection:

MESSAGES_SCHEMA = pa.schema([
    pa.field("id", pa.utf8()),
    pa.field("sender", pa.utf8()),
    pa.field("recipient", pa.utf8()),
    pa.field("channel_id", pa.utf8()),
    pa.field("ciphertext", pa.utf8()),
    pa.field("iv", pa.utf8()),
    pa.field("sender_public_key_jwk", pa.utf8()),
    pa.field("timestamp", pa.float64()),
    pa.field("message_type", pa.utf8()),
    pa.field("attachment_id", pa.utf8()),
])

ATTACHMENTS_SCHEMA = pa.schema([
    pa.field("id", pa.utf8()),
    pa.field("sender", pa.utf8()),
    pa.field("channel_id", pa.utf8()),
    pa.field("encrypted_data", pa.large_utf8()),
    pa.field("iv", pa.utf8()),
    pa.field("timestamp", pa.float64()),
])

Note the use of pa.large_utf8() for encrypted_data — base64-encoded encrypted images can exceed Arrow’s standard string size limits, so the large variant handles arbitrarily sized blobs.

Schema Migration Without Downtime

When adding image support to an existing deployment, the database needed new message_type and attachment_id columns on the messages table. LanceDB’s embedded nature means migrations run at startup rather than requiring a separate migration tool:

def init_db():
    db = get_db()
    existing = db.table_names()

    if "messages" not in existing:
        db.create_table("messages", schema=MESSAGES_SCHEMA)
    else:
        # Migrate: add message_type and attachment_id columns if missing
        tbl = db.open_table("messages")
        schema = tbl.schema
        field_names = {f.name for f in schema}
        if "message_type" not in field_names or "attachment_id" not in field_names:
            arrow_table = tbl.to_arrow()
            if "message_type" not in field_names:
                col = pa.array(["text"] * len(arrow_table), type=pa.utf8())
                arrow_table = arrow_table.append_column("message_type", col)
            if "attachment_id" not in field_names:
                col = pa.array([""] * len(arrow_table), type=pa.utf8())
                arrow_table = arrow_table.append_column("attachment_id", col)
            db.drop_table("messages")
            db.create_table("messages", data=arrow_table)

    if "attachments" not in existing:
        db.create_table("attachments", schema=ATTACHMENTS_SCHEMA)

The migration reads the existing table to an Arrow table, appends the new columns with sensible defaults, drops the old table, and recreates it with the new data. Existing messages get message_type = "text" and attachment_id = "" — backward compatible without data loss.

Storing Encrypted Attachments Directly in LanceDB

Rather than introducing a separate file system or object store for image attachments, encrypted blobs are stored directly in a LanceDB table. When a channel image is sent, the server stores the symmetrically-encrypted blob once in the attachments table and references it by ID from each member’s message record:

# Store attachment blob if present (image messages)
attachment_id = ""
if payload.attachment and msg_type == "image":
    attachment_id = str(uuid.uuid4())
    att_table = db.open_table("attachments")
    att_table.add([{
        "id": attachment_id,
        "sender": username,
        "channel_id": channel_id,
        "encrypted_data": payload.attachment.encrypted_data,
        "iv": payload.attachment.iv,
        "timestamp": ts,
    }])

Retrieval verifies channel membership before returning the encrypted blob:

@app.get("/api/attachments/{attachment_id}")
def get_attachment(attachment_id: str, token: str = Query(...)):
    username = require_auth(token)
    validate_id(attachment_id)
    db = get_db()

    att_table = db.open_table("attachments")
    rows = att_table.search().where(f"id = '{attachment_id}'", prefilter=True).to_list()
    if not rows:
        raise HTTPException(404, "Attachment not found")

    att = rows[0]

    # Verify the requester has access (is a member of the channel)
    if att["channel_id"]:
        members_table = db.open_table("channel_members")
        membership = members_table.search().where(
            f"channel_id = '{att['channel_id']}' AND username = '{username}'", prefilter=True
        ).to_list()
        if not membership:
            raise HTTPException(403, "Not a member of this channel")

    return {"id": att["id"], "encrypted_data": att["encrypted_data"], "iv": att["iv"]}

This keeps the architecture to a single storage system — no S3 buckets, no file system paths, no additional infrastructure.

Real-Time Delivery: WebSocket Relay

The server maintains a mapping of online users to their WebSocket connections and relays encrypted messages in real time:

# Active WebSocket connections: username -> list[WebSocket]
connections: dict[str, list[WebSocket]] = defaultdict(list)

Channel messages demonstrate the fan-out pattern — each envelope is encrypted for a specific member, stored in LanceDB targeting that recipient, and relayed via WebSocket if they’re online:

async def handle_channel_message(data: dict, username: str, ws: WebSocket):
    payload = ChannelMessagePayload(**data)
    msg_group_id = str(uuid.uuid4())
    ts = time.time()

    # Store attachment blob if present
    attachment_id = ""
    if payload.attachment and msg_type == "image":
        attachment_id = str(uuid.uuid4())
        att_table = db.open_table("attachments")
        att_table.add([{...}])

    messages = db.open_table("messages")

    for env in payload.envelopes:
        msg_id = str(uuid.uuid4())
        messages.add([{
            "id": msg_id,
            "sender": username,
            "recipient": env.target_user,
            "channel_id": payload.channel_id,
            "ciphertext": env.ciphertext,
            "iv": env.iv,
            "sender_public_key_jwk": env.sender_public_key_jwk,
            "timestamp": ts,
            "message_type": msg_type,
            "attachment_id": attachment_id,
        }])

        relay = { "type": "channel", "id": msg_id, "group_id": msg_group_id, ... }

        for peer_ws in connections.get(env.target_user, []):
            await peer_ws.send_json(relay)

The group_id ties all per-member copies of the same logical message together, enabling the client to deduplicate when it receives both a WebSocket relay and a REST poll response.

Input Validation and Security Hardening

Since LanceDB uses string-based where clauses for filtering, all user-supplied identifiers are validated against strict regex patterns before being interpolated:

SAFE_NAME_RE = re.compile(rf"^[a-zA-Z0-9_\-]{{1,{USERNAME_MAX_LENGTH}}}$")
SAFE_ID_RE = re.compile(rf"^[a-zA-Z0-9_\-]{{1,{ID_MAX_LENGTH}}}$")

def validate_username(name: str) -> str:
    if not SAFE_NAME_RE.match(name):
        raise HTTPException(400, "Username must be 1-64 alphanumeric characters, hyphens, or underscores")
    return name

This prevents injection in filter expressions like f"sender = '{username}' AND channel_id = 'self'". Additional hardening includes:

Bot Simulation with PyNaCl Interoperability

A bot script simulates 100 users across 10 channels, generating real encrypted messages using PyNaCl (Python bindings for libsodium). This validates the full encryption pipeline — messages encrypted by Python’s nacl.public.Box are decryptable by the browser’s libsodium WASM:

def encrypt_for(plaintext: str, recipient_pub_bytes: bytes) -> dict:
    """Encrypt plaintext for a recipient using an ephemeral key pair.
    Matches the JS CRYPTO.encrypt() format."""
    ephemeral_private = nacl.public.PrivateKey.generate()
    ephemeral_public = ephemeral_private.public_key
    recipient_pub = nacl.public.PublicKey(recipient_pub_bytes)
    box = nacl.public.Box(ephemeral_private, recipient_pub)
    nonce = nacl.utils.random(nacl.public.Box.NONCE_SIZE)
    encrypted = box.encrypt(plaintext.encode(), nonce)

    return {
        "ciphertext": b64enc(encrypted.ciphertext),
        "iv": b64enc(nonce),
        "sender_public_key_jwk": b64enc(ephemeral_public.encode()),
    }

The bot system uses split RNGs — deterministic seeding for key generation (so bots can re-register idempotently) and random seeding for message scheduling (so each run produces fresh conversations):

key_rng = random.Random(SEED)   # Deterministic for reproducible keys
msg_rng = random.Random()       # Random for fresh messages each run

Testing Strategy

The test suite uses 85 tests across 8 modules, each running against a fresh LanceDB instance:

@pytest.fixture(autouse=True)
def _clean_db():
    """Create a fresh database for each test."""
    db_path = os.environ["DATABASE_PATH"]
    if os.path.exists(db_path):
        shutil.rmtree(db_path, ignore_errors=True)
    init_db()
    _rate_limit_store.clear()
    yield
    if os.path.exists(db_path):
        shutil.rmtree(db_path, ignore_errors=True)

Test Coverage by Area

Authentication (12 tests): Registration, login, token validation, duplicate usernames, rate limiting.

Channels (20 tests): CRUD operations, join/invite flows, browse, member key retrieval, duplicate channel name enforcement.

Messages (15 tests): DM history, channel messages, REST sending, timestamp filtering, image message storage, attachment retrieval, non-member access denial.

WebSocket (6 tests): Connection lifecycle, DM relay, channel relay with multi-recipient fan-out.

Validation (10 tests): Username/ID pattern enforcement, SQL injection prevention, parameter bounds.

Crypto Compatibility (6 tests): PyNaCl/libsodium interoperability — encrypt in Python, verify format matches browser expectations.

Configuration (8 tests): Config loading, JWT module, database initialization.

Makefile Convenience

# Run all tests
make test

# Start dev server with hot reload
make server

# Run 100-bot simulation
make bots

# Reset database
make reset-db

Usage Examples

Starting the Application

# Install dependencies and start the server
make dev
make server

The server starts at http://localhost:8000 with LanceDB auto-initialized on first request.

Seal Login

Populating with Bot Data

# Register 100 bots, create 10 channels, send ~250 encrypted messages
make bots

All bot messages use real X25519 encryption and are decryptable by any client that opens the corresponding channel.

Custom Configuration

# config.yaml
app:
  title: "Seal"
  host: "0.0.0.0"
  port: 8000

database:
  path: "data/chat.lance"

auth:
  jwt_algorithm: "HS256"
  token_expire_minutes: 1440

validation:
  username_max_length: 64
  id_max_length: 128

attachments:
  max_image_size_mb: 5

Environment variables in .env override any YAML value:

JWT_SECRET=your-production-secret-here
DATABASE_PATH=/var/data/chat.lance
APP_PORT=443

Key Export and Device Transfer

Users can export their encryption keys (password-protected with Argon2id) from one browser and import them on another device. The export file contains:

{
  "version": 2,
  "username": "alice",
  "publicKey": "base64-encoded-public-key",
  "encryptedPrivateKey": {
    "salt": "...",
    "nonce": "...",
    "data": "..."
  },
  "exportedAt": "2025-12-20T10:30:00.000Z",
  "warning": "Private key is password-encrypted (Argon2id). Keep this file safe."
}

Production Considerations

LanceDB Storage Sizing

Each message record is primarily UTF-8 strings (base64 ciphertext, nonces, keys). For a deployment with 10,000 users and 1 million messages:

LanceDB’s columnar format compresses well since many columns (like channel_id) have high cardinality repetition.

WebSocket Connection Management

The connection registry uses defaultdict(list) to support multiple simultaneous connections per user (multiple tabs or devices):

connections: dict[str, list[WebSocket]] = defaultdict(list)

On disconnect, only the specific WebSocket is removed:

except WebSocketDisconnect:
    connections[username].remove(ws)
    if not connections[username]:
        del connections[username]

Rate Limiting

Authentication endpoints are rate-limited at 20 requests per 60-second window per client IP. The in-memory store is lightweight but resets on server restart — production deployments should consider Redis-backed rate limiting for persistence across restarts.

Channel Name Uniqueness

Channel names are enforced unique at the server level, preventing duplicates that could confuse users or bot scripts:

existing = channels.search().where(f"name = '{req.name}'", prefilter=True).to_list()
if existing:
    raise HTTPException(400, f"Channel name '{req.name}' is already taken")

Next Steps

SSL with Let’s Encrypt: Since Seal handles private messages, production deployments need TLS. Let’s Encrypt with Certbot makes this straightforward — a reverse proxy like Caddy can automate certificate provisioning and renewal with zero configuration, upgrading both the REST API and WebSocket connections to WSS without touching application code.

Secure Video Conferencing: Extending Seal with encrypted video calls is a natural next step. WebRTC provides peer-to-peer media transport with DTLS-SRTP encryption built in, and libsodium can handle the signaling-layer encryption to keep call metadata private from the server. The existing per-channel key infrastructure could bootstrap the session key exchange needed for group calls.

Containerizing with Docker/Podman: Packaging Seal as a container image would simplify deployment — a single Dockerfile bundling FastAPI, LanceDB, and the static frontend with a mounted volume for the Lance data directory. This pairs well with the zero-infrastructure philosophy: one container, one volume, no external dependencies.

Key Takeaways

Building an E2E encrypted chat application with LanceDB and libsodium requires:

  1. Ephemeral Key Design: Generate a fresh X25519 key pair per message to achieve forward secrecy — compromising the long-term key doesn’t expose past messages
  2. Self-Encryption for Sent Messages: Encrypt a second copy for the sender’s own public key so they can read their own message history
  3. Hybrid Encryption for Channels: Encrypt large payloads (images) once with a symmetric key, then encrypt only the small key per-member to keep fan-out efficient
  4. Unified Storage with LanceDB: Store messages, metadata, and encrypted attachments in a single embedded database — no external infrastructure required
  5. Schema Migration at Startup: Use PyArrow’s table manipulation to add columns to existing LanceDB tables without downtime or external migration tools
  6. Input Validation for Embedded Queries: Validate all user input against strict regex patterns before interpolating into LanceDB filter expressions
  7. Cross-Language Crypto Interop: Ensure Python (PyNaCl) and JavaScript (libsodium WASM) produce compatible ciphertext by using the same primitives and encoding conventions
  8. Password-Protected Key Backup: Use Argon2id key derivation for key export files, enabling secure device transfer without trusting the server

The combination of LanceDB’s zero-infrastructure embedded storage, libsodium’s audited cryptographic primitives, and careful protocol design enables a fully functional encrypted messaging system where the server genuinely cannot read user messages — all without requiring a single external service beyond the FastAPI process itself.

Seal will be released soon at github.com/justinrmiller/seal — stay tuned!