Private AI Installations

I configure your private AI stack. At home, on company premises, or in your cloud.

Local inference, RAG, agents, vector databases, observability. Software selection, hardware sizing, installation, hardening, maintenance. The model runs where you decide, on data that stays yours.

Where the model runs is your decision, not the provider's.

Private AI four-layer stack within a private perimeter, with four deployment options

Stack

Software, curated and integrated for your context.

Not a single package. A targeted combination of mature open-source components, chosen based on data, constraints, and expected load.

Local inference

Runtimes optimized for running LLMs on your own hardware.

  • Ollama
  • vLLM
  • llama.cpp
  • LocalAI

Conversational interfaces

UIs to interact with local models — end users and teams.

  • Open WebUI
  • AnythingLLM
  • LM Studio
  • Text Generation WebUI

RAG and knowledge

Retrieval-Augmented Generation to query documentation, knowledge bases, archives.

  • PrivateGPT
  • AnythingLLM
  • Khoj
  • Continue.dev
  • Danswer / Onyx

Agents and automation

AI agents that operate on controlled environments, flows, and data.

  • Dify
  • Flowise
  • Langflow
  • n8n

Vector database

Semantic indices for RAG, search, similarity matching.

  • Qdrant
  • Chroma
  • Weaviate
  • Milvus
  • pgvector

Observability

Traceability of prompts, responses, latency, cost, and quality drift.

  • Langfuse
  • Phoenix (Arize)

Infrastructure

Containerization, orchestration, private networking, and lifecycle management.

  • Docker / Docker Compose
  • K3s / Kubernetes
  • Tailscale / Headscale
  • Portainer
  • Coolify

Deployment

At home, in the office, on company servers, or in your private cloud.

The "where" is not secondary. It's a data governance choice that precedes every other architectural decision.

On-premise

Workstations, office servers, corporate datacenters. Your hardware, full data control, no data ever leaves the perimeter.

European private cloud

Hetzner, OVH, Scaleway, and EU-sovereign providers. Data in Europe, clear contracts, predictable cost, GDPR and NIS2 compliance.

Hybrid

Heavy compute on-premise, ancillary services in cloud. The best of both worlds: controlled capex, opportunistic scaling.

Edge

Intel NUC, mini-PCs, ARM servers. Inference at the edge — per-device, branch offices, constrained or offline contexts.

Three starting configurations

Where to begin: three packages, each with a specific use case.

These are not rigid offerings: they are coherent starting points, calibrated on the three most common scenarios. From there the rest is sized against real data, workload, and constraints.

01 · Starter

Private AI Starter

For small teams that want an internal ChatGPT, without sending conversations to external providers.

  • OpenWebUI + Ollama on a single server or workstation
  • User and role management, authenticated access
  • Local model sized to expected workload
  • Backup of configurations and user data
  • Minimum hardening: firewall, TLS access, separated secrets
  • Operational documentation, knowledge transfer

02 · Department

Private RAG Department

For departments that want to query their own documents with answers traceable back to sources.

  • Starter stack + AnythingLLM or OpenWebUI with RAG
  • Vector database (Qdrant or equivalent)
  • Document ingestion, chunking calibrated to the domain
  • Source citations — no answers without a reference
  • Workspace-level permissions, data separation
  • Observability: query tracing, latency, output quality (Langfuse)

03 · Production

Private AI Production Stack

For systems that enter real operational processes: SLAs, recovery, structured maintenance.

  • Everything in Department, redesigned for production
  • Container orchestration (Docker Compose or Kubernetes)
  • Periodic backups, tested disaster recovery
  • Dedicated networking, network isolation, audit logs
  • Coordinated maintenance: runtime, models, CVE patches
  • Quarterly quality and throughput reporting

Exact sizing (hardware, model, perimeter) is set after the initial scoping, not before.

Output

What lands at your premises is a working system, not a kit to assemble.

What's included

  • Hardware audit: GPU compatibility, thermal envelope, estimated throughput
  • Model sizing against use case and budget
  • Complete installation of the selected stack
  • Security hardening and network isolation
  • Backup, restore, and disaster recovery strategy
  • Monitoring and observability configured
  • Operational documentation
  • Knowledge transfer to the internal team

Optional maintenance

  • Coordinated runtime and model updates
  • Periodic thermal and throughput health checks
  • Security patches and CVE management
  • Tuning for new use cases
  • Quarterly quality reporting

Why not install it yourself

Installing Ollama is the easy part. The rest is engineering.

You open the browser, download the binary, it runs. And there you think you're done. In reality you're just starting.

What isn't visible at first

  • GPU thermal and mechanical behavior under sustained load
  • CUDA driver / runtime version / kernel conflicts
  • Model selection against context window and real load
  • Semantic chunking and retrieval strategy for RAG
  • Network hardening, secret management, audit logs
  • Backup of vector indices and training data
  • Updates and silent regressions
  • Observability of output quality, not just system metrics

What experience brings

  • Preventive hardware validation, before spending
  • Stack chosen on real constraints, not on hype
  • Documented and reproducible configuration
  • Security designed in, not bolted on
  • Operability verified under load, not on demo
  • Predictable maintenance, not emergencies

The model is one variable. The environment that hosts it is the rest.

Read more

Want to understand each tool before deciding?

Every component of the stack has a dedicated page: how it works, what it does for the business, when it fits, how much it costs to install. Written for the decision-maker, not the technician.

Explore the tools catalog →

Want a working Private AI system, not an experiment?

The initial assessment clarifies use case, data, constraints, available or required hardware, and delivery path.