Vector backends
Vector search runs inside the registry process. The vector backend is selected per deployment via PODIUM_VECTOR_BACKEND. Defaults are pgvector in standard mode and sqlite-vec in standalone mode. The default binary also ships adapters for Pinecone, Weaviate Cloud, and Qdrant Cloud. Custom backends register through the RegistrySearchProvider SPI; see Extending.
podium sync, the SDKs, and the MCP server never connect to the vector backend directly. They reach the registry over HTTP, and the registry handles the vector store. A filesystem-source registry has no vector search at all because there is no registry process running to query.
What you need
- A registry process running in either standalone (
podium serve --standalone) or standard mode. - An account on the managed service and an empty index or collection prepared per that service’s documentation.
- The API key and endpoint for the index or collection.
A filesystem-only setup (see Solo / filesystem) cannot use a managed vector backend. podium sync only materializes; it never queries a vector store. To get hybrid search against a managed backend without standing up the full standard stack, run podium serve --standalone against the same directory and configure the vector backend on that process.
Self-embedding and storage-only modes
Each managed backend supports two modes, selected by whether an inference-model variable is set:
- Self-embedding. The backend computes the embedding from the text projection the registry submits. Pinecone Integrated Inference, Weaviate Cloud vectorizers, and Qdrant Cloud Inference all support this. No external embedding provider is required.
- Storage-only. The backend stores vectors that the registry computes through a configured
EmbeddingProvider. The provider is selected viaPODIUM_EMBEDDING_PROVIDER(openai,voyage,cohere, orollama) and is required in this mode.
Setting PODIUM_EMBEDDING_PROVIDER to the empty string disables embedding generation entirely; search degrades to BM25 over manifest text.
Pinecone
Server-side environment variables:
| Variable | Description | Default |
|---|---|---|
PODIUM_VECTOR_BACKEND |
Set to pinecone. |
— |
PODIUM_PINECONE_API_KEY |
Pinecone API key. | required |
PODIUM_PINECONE_INDEX |
Index name. | required |
PODIUM_PINECONE_HOST |
Index host URL (Pinecone serverless). | auto-resolved from the index name |
PODIUM_PINECONE_NAMESPACE |
Namespace prefix used per tenant. | default |
PODIUM_PINECONE_INFERENCE_MODEL |
Hosted model name to enable Integrated Inference. | unset (storage-only mode) |
Standalone server with Pinecone
export PODIUM_VECTOR_BACKEND=pinecone
export PODIUM_PINECONE_API_KEY=pcn-...
export PODIUM_PINECONE_INDEX=podium-dev
export PODIUM_PINECONE_INFERENCE_MODEL=multilingual-e5-large # optional
podium serve --standalone --layer-path ~/podium-artifacts/
This setup runs as a single binary with embedded SQLite metadata and filesystem object storage. Pinecone holds the artifact embeddings. Postgres is not required, and no separate embedding service is needed when self-embedding is on.
For storage-only mode, omit PODIUM_PINECONE_INFERENCE_MODEL and configure an embedding provider:
export PODIUM_EMBEDDING_PROVIDER=openai
export OPENAI_API_KEY=sk-...
Standard deployment with Pinecone
In /etc/podium/registry.yaml:
registry:
endpoint: https://podium.acme.com
bind: 0.0.0.0:8080
store:
type: postgres
dsn: ${PODIUM_POSTGRES_DSN}
object_store:
type: s3
bucket: acme-podium
region: us-east-1
vector_backend:
type: pinecone
api_key: ${PODIUM_PINECONE_API_KEY}
index: acme-prod
namespace: ${PODIUM_TENANT_ID}
inference_model: multilingual-e5-large # enables self-embedding
# Omitted because the vector backend above self-embeds.
# embedding_provider:
# type: openai
# api_key: ${OPENAI_API_KEY}
# model: text-embedding-3-large
identity_provider:
type: oauth-device-code
audience: https://podium.acme.com
authorization_endpoint: https://acme.okta.com/oauth2/default
Environment variables and CLI flags override file values. Use ${ENV_VAR} interpolation for secrets.
Weaviate Cloud
Server-side environment variables:
| Variable | Description | Default |
|---|---|---|
PODIUM_VECTOR_BACKEND |
Set to weaviate-cloud. |
— |
PODIUM_WEAVIATE_URL |
Cluster REST URL. | required |
PODIUM_WEAVIATE_API_KEY |
API key. | required |
PODIUM_WEAVIATE_COLLECTION |
Collection name. | required |
PODIUM_WEAVIATE_GRPC_URL |
gRPC endpoint. Reserved and not currently read; the Weaviate backend uses the REST data plane. | derived from the REST URL |
PODIUM_WEAVIATE_VECTORIZER |
Vectorizer module (text2vec-openai, text2vec-weaviate, and similar) to enable self-embedding. |
unset (storage-only mode) |
Standalone server with Weaviate Cloud
export PODIUM_VECTOR_BACKEND=weaviate-cloud
export PODIUM_WEAVIATE_URL=https://acme.weaviate.network
export PODIUM_WEAVIATE_API_KEY=wv-...
export PODIUM_WEAVIATE_COLLECTION=PodiumArtifacts
export PODIUM_WEAVIATE_VECTORIZER=text2vec-weaviate # optional
podium serve --standalone --layer-path ~/podium-artifacts/
Standard deployment with Weaviate Cloud
vector_backend:
type: weaviate-cloud
url: ${PODIUM_WEAVIATE_URL}
api_key: ${PODIUM_WEAVIATE_API_KEY}
collection: PodiumArtifacts
vectorizer: text2vec-weaviate # enables self-embedding
For storage-only mode, omit vectorizer: and add an embedding_provider: block as in the Pinecone example.
Qdrant Cloud
Server-side environment variables:
| Variable | Description | Default |
|---|---|---|
PODIUM_VECTOR_BACKEND |
Set to qdrant-cloud. |
— |
PODIUM_QDRANT_URL |
Cluster REST URL. | required |
PODIUM_QDRANT_API_KEY |
API key. | required |
PODIUM_QDRANT_COLLECTION |
Collection name. | required |
PODIUM_QDRANT_GRPC_PORT |
gRPC port. Reserved and not currently read; the Qdrant backend uses the REST data plane. | 6334 |
PODIUM_QDRANT_INFERENCE_MODEL |
Hosted Cloud Inference model name to enable self-embedding. | unset (storage-only mode) |
Standalone server with Qdrant Cloud
export PODIUM_VECTOR_BACKEND=qdrant-cloud
export PODIUM_QDRANT_URL=https://acme.eu-central.aws.cloud.qdrant.io:6333
export PODIUM_QDRANT_API_KEY=qdr-...
export PODIUM_QDRANT_COLLECTION=podium_artifacts
export PODIUM_QDRANT_INFERENCE_MODEL=bge-small-en # optional
podium serve --standalone --layer-path ~/podium-artifacts/
Standard deployment with Qdrant Cloud
vector_backend:
type: qdrant-cloud
url: ${PODIUM_QDRANT_URL}
api_key: ${PODIUM_QDRANT_API_KEY}
collection: podium_artifacts
inference_model: bge-small-en # enables self-embedding
Switching backends on a running deployment
The vector store records (model_id, dimensions) per artifact. When the configured backend or embedding model changes, run podium admin reembed to repopulate:
podium admin reembed --all
# or scope to a window
podium admin reembed --since 2026-01-01
During re-embedding, the store may transiently hold mixed dimensions. Query time restricts results to vectors matching the currently-configured model. The registry emits embedding.reembed_in_progress events for progress monitoring. Stale-dimension rows are purged when re-embedding completes.
For non-collocated backends (every managed service falls in this category), ingest writes are coordinated through a transactional outbox: the manifest commit and a vector_pending row land in the same metadata-store transaction, and a background worker drives the write to the vector backend. This keeps the metadata commit and the vector write consistent under failure.
Local-overlay search on the MCP server
The MCP server can use a managed vector backend independently of the registry, to give the workspace-local overlay (.podium/overlay/) better semantic recall. The same PODIUM_VECTOR_BACKEND and per-backend variables apply when the MCP server is configured with LocalSearchProvider against an external backend.
This is independent of the registry-side backend. It only affects how local-overlay manifests are indexed; registry-side search_artifacts results are merged in via reciprocal rank fusion regardless. The MCP server still requires a server-source registry; this side path does not enable search against a filesystem-only catalog.
Operational notes
- The managed service’s costs, identity, and quotas are the operator’s responsibility. Podium does not proxy credentials and does not enforce per-tenant cost ceilings on the backend.
- If the configured vector backend is unreachable and no embedding provider is configured for storage-only fallback, search degrades to BM25 over manifest text. The response includes a structured indicator so callers can detect the degraded path.
- Migration in either direction is supported.
podium admin reembedrepopulates the newly-configured backend from the canonical text projections. The previous backend can stay in place during cut-over or be torn down after re-embedding completes. - Search QPS, latency, and recall depend on the backend’s index configuration (shard count, replicas, dimensionality). The Operator guide covers capacity planning across the registry as a whole.