Self-Host
Meeting
Transcription
Run the whole meeting bot + transcription pipeline inside your VPC. Apache 2.0, Docker + Kubernetes, full data residency, no audio leaves your infrastructure. Works with Google Meet, Microsoft Teams, and Zoom from a single API.
Waiting for meeting participants…
Feed meetings into your AI. Live.
AI needs data, and the best data is real-time. Vexa streams transcripts, audio, and context from any meeting straight to your models and agents — as it happens.
Built for self-hosting
Vexa is the only Apache-2.0 meeting bot + transcription API that fully self-hosts. The bot pod, audio capture, Whisper transcription service, API gateway, and transcript store all run as containers in your environment — Docker Compose for dev, Helm chart for production Kubernetes. No SaaS dependency, no telemetry phone-home, no audio path that crosses your firewall. Regulated industries (legal, healthcare, financial services, government) run Vexa fully air-gapped on-premise; everyone else runs it in their own AWS / GCP / Azure / on-prem cluster with the same Helm chart.
- Apache 2.0 — no license-key checks, no callbacks home, no per-seat metering
- Docker Compose for dev (5-min quickstart) + Helm chart for production Kubernetes
- Audio never leaves your VPC — Whisper runs on your GPU nodes, transcripts persist in your Postgres
# Clone the repo
git clone https://github.com/Vexa-ai/vexa.git
cd vexa
# Configure (one env file)
cp .env.example .env
# Edit .env: WHISPER_MODEL_SIZE, JWT_SECRET, ADMIN_API_TOKEN, ...
# Bring up the whole stack
docker compose up -d
# Mint your first API key (admin endpoint)
curl -X POST http://localhost:8056/admin/users \
-H "X-Admin-Token: $ADMIN_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"email": "you@yourco.com"}'
# Drop a bot into any meeting
curl -X POST http://localhost:8056/bots \
-H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"platform": "google_meet",
"native_meeting_id": "abc-defg-hij",
"transcribe_enabled": true,
"transcription_tier": "realtime"
}'Set $API_BASE to https://api.cloud.vexa.ai (hosted) or http://localhost:8056 (self-host). Get $API_KEY from /get-started.
Self-host FAQ
What operators ask before they deploy Vexa in their own infrastructure.
Is the self-host path really feature-complete vs hosted Vexa?
Yes — there's no functionality split between hosted and self-hosted. Same bot, same Whisper service, same REST API, same MCP server. Hosted Vexa is just a managed deployment of the same Apache-2.0 code. Customers regularly start on hosted, then migrate to self-host once they cross volume / compliance thresholds.
What infrastructure do I actually need?
For dev/single-tenant: one VM with Docker (≥4 CPU, 8GB RAM, optional GPU for faster Whisper). For production: a Kubernetes cluster with at least one GPU node pool (L4, A10, A100, or RTX 4090 work well — the Helm chart documents tested configurations). Postgres for state, Redis for queues, Minio / S3-compatible store for recordings (optional).
Can I run Vexa fully air-gapped, no internet access at all?
Yes. Pre-pull all container images from your private registry, mirror the Whisper model weights to internal storage, configure outbound traffic to be denied. The bot pod needs network access to the meeting platform (meet.google.com, teams.microsoft.com, zoom.us) — but no Vexa-controlled telemetry endpoint exists.
How does the Helm chart handle scale?
Bot pods are stateless and horizontally scaled per concurrent meeting (one pod per active bot). The Whisper service auto-scales on GPU node pools — typical sizing is 1 GPU per 5-10 concurrent realtime transcriptions, depending on model tier. Postgres + Redis are sized per tenant. See the scaling doc for production sizing tables.
What's the difference vs Vexa Lite?
Vexa Lite is the same code packaged as a single all-in-one Docker container for the simplest deployments (1 user, low volume, single host). Vexa "full" is the Compose / Helm multi-service deployment for production. Both are Apache 2.0; pick based on volume + ops surface.
Common patterns
Who runs their own Vexa
Three buyer shapes we see repeatedly across self-hosted Vexa deployments.
Regulated industries (legal, healthcare, financial)
Scenario
You're subject to HIPAA / GDPR / FINRA. Sending meeting audio to a third-party SaaS is non-starter. You need transcription with full audit trail under your control.
What you get
Self-host Vexa in your VPC behind your firewall. Audio captured in bot pods, transcribed on your GPU nodes, persisted in your Postgres. Every transcript event is auditable. Pair with your existing SIEM for the audit trail.
EU data residency (Schrems II, GDPR)
Scenario
Your buyers require EU-only data processing. Hosted SaaS — even with EU regions — fails their procurement review due to sub-processor sprawl.
What you get
Self-host on your EU cloud (AWS Frankfurt, GCP Belgium, Hetzner, OVH). Zero sub-processors. Sign a DPA with yourself. Pass procurement on the first attempt.
High-volume / cost-optimized at scale
Scenario
You run thousands of concurrent transcription hours per month and the hosted per-bot-hour pricing crosses break-even with running your own GPU pool.
What you get
Self-host on a reserved GPU cluster (your own A10s, A100s, or 4090s). Marginal cost drops from $0.30-0.45/hr to compute COGS (~$0.05-0.15/hr at scale). Hosted Vexa remains an option for spillover.
Quickstart
Self-host Vexa in 5 minutes
docker compose up brings the full stack online: bot dispatcher, runtime API, Whisper, gateway, Postgres, Redis.
- 1
Clone the repo
git clone https://github.com/Vexa-ai/vexa. Apache 2.0 — fork, modify, redistribute, no asks.
- 2
Configure one env file
Copy .env.example to .env. Set WHISPER_MODEL_SIZE (base/small/medium/large/turbo), JWT_SECRET, ADMIN_API_TOKEN, optional Postgres / Redis URLs.
- 3
docker compose up
The whole stack — bot dispatcher, runtime API, Whisper service, gateway, Postgres, Redis — comes up in 2-3 minutes.
- 4
Mint your first API key
Hit the admin endpoint with your ADMIN_API_TOKEN to create a user + key. Multi-tenant patterns documented in self-hosted-management.
- 5
Production: Helm chart
For production Kubernetes, helm install vexa from the chart in deploy/helm — GPU node pools, ingress, Postgres operator, the works.
How it works under the hood
Self-Host architecture
The self-host stack: bot dispatcher service (allocates bot pods), runtime API (control plane), Whisper transcription service (runs on GPU nodes, configurable model size), gateway (REST + WebSocket + MCP entry point), Postgres (state), Redis (queues + cache). Everything is one Helm chart. No telemetry, no callbacks, no SaaS dependency — Apache 2.0 means you can audit every line, fork if needed, run it forever.
Pipeline
Documentation
Go deeper on self-hosting
Deployment, scaling, security, admin API, and Whisper sizing — everything you need to run Vexa in production.
Self-host deployment guide
Docker Compose for dev, Helm for production, sizing tables, GPU pool patterns.
Vexa Lite (single-container)
All-in-one Docker container — fastest path to a running instance, low-volume use cases.
Scaling & architecture
Concurrent-bot limits, GPU sizing, Kubernetes node-pool layout.
Security & data handling
Threat model, encryption-at-rest, audio retention controls, audit-trail patterns.
Admin API
Self-hosted user / key management, tenant scoping, quota enforcement.
Transcription quality
Whisper model tiers, language support, hallucination filtering, when to pick which.
Full reference at docs.vexa.ai.
Related reading
Keep going on Self-Host
Tutorials, deep-dives, and adjacent surfaces from across the Vexa substrate.
Self-host meeting transcription in 5 minutes
Docker Compose quickstart — the canonical first-deploy guide.
Privacy-first meeting transcription — why self-hosted matters
GDPR, HIPAA, FINRA — the compliance case for keeping audio in your VPC.
Open-source transcription API — complete guide
Architecture overview, services breakdown, what runs where.
Pay only for
what you use
Self-host for free, or let us run it for you. Simple usage-based pricing—no per-seat tax.
Apache 2.0 · Self-host forever
Best for teams who need full control
1 bot · Flat monthly · Everything included
Best for personal use — 1 meeting at a time
No credit card required
Bot infrastructure · +$0.20/hr transcription
Best for teams & API builders · $5 free credit
No credit card required
On-premises, OpenShift, Kubernetes. Dedicated support + SLA.
For self-hosted Vexa bot users. Transcription only — $0.002/min.
Frequently asked questions
What's included in the $0.30/hr Bot Service?
The base rate covers bot infrastructure: audio capture, webhooks, and 12 months of audio storage across Google Meet, Microsoft Teams, and Zoom. Add real-time transcription for +$0.20/hr ($0.50/hr total).
Which plan is right for me?
Choose Individual ($12/mo) if you need one bot for personal use—it includes real-time transcription, storage, and the Dashboard. Choose Pay-as-you-go ($0.30/hr) if you need multiple simultaneous bots or want to pay only for what you use—ideal for teams and API integrations. Both plans include Google Meet and Microsoft Teams support. Self-host for free from GitHub if you need full data control.
How does the free credit work?
Every new account gets $5 in free bot credit—no credit card required. That covers ~16 hours of bot time at $0.30/hr. All features are available: audio capture, transcription, real-time data, and full API access.
Can I self-host Vexa?
Yes. Vexa is Apache 2.0 licensed and fully self-hostable. Deploy with Docker Compose, Kubernetes, or OpenShift on your own infrastructure with complete data sovereignty.
How does Vexa compare to Recall.ai?
Vexa is open source, self-hostable, and up to 40% cheaper. Bot: $0.30/hr vs Recall.ai ~$0.50/hr. Transcription add-on: $0.20/hr vs ~$0.15/hr. See our detailed comparison.
What's the Individual plan for?
The Individual plan ($12/mo) is for single users who need one concurrent bot with unlimited meetings. It includes real-time transcription, recording, storage, REST API, WebSockets, and the UI Dashboard—everything you need for personal meeting intelligence. It supports Google Meet and Microsoft Teams, with Zoom coming soon.
Ready when you are
Run your own Vexa.
Up in 5 minutes.
Apache 2.0 — no license-key, no callbacks home, no per-seat metering. Clone the repo, docker compose up, mint a key, drop your first bot.
Apache 2.0 · Self-host or cloud · No credit card required