Skip to main content

Managing Services

Services are AI inference engines that run as Docker containers on your node. Citadel manages their lifecycle through Docker Compose, with compose files embedded directly in the binary -- no external files to download or maintain.

Supported Services

ServiceDescriptionGPU Required
vLLMHigh-throughput LLM inference serverYes (NVIDIA)
OllamaEasy-to-use model runnerNo (GPU optional)
llama.cppLightweight CPU/GPU inferenceYes (NVIDIA)
LM StudioDesktop-friendly model serverNo (GPU optional)
ExtractionGeneric extraction serviceNo

Starting Services

Start all services defined in your citadel.yaml manifest:

citadel run

Start a specific service:

citadel run ollama

Services are started using docker compose under the hood. Each service runs in its own project namespace (e.g., citadel-vllm, citadel-ollama) to keep containers organized and isolated.

Stopping Services

Stop all services:

citadel stop

Stop a specific service:

citadel stop ollama

Restarting Services

Restart all services (stops and then starts them):

citadel run --restart

Viewing Logs

Stream logs from a running service:

citadel logs vllm -f

The -f flag follows the log output in real time, similar to docker compose logs -f.

Testing Services

Run diagnostic tests against a service to verify it is healthy and responding:

citadel test --service vllm

This sends test requests to the service endpoint and reports whether inference is working correctly.

GPU Requirements

vLLM and llama.cpp require the NVIDIA Container Toolkit and a properly configured Docker runtime. The NVIDIA runtime must be set as the default in /etc/docker/daemon.json:

{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}

If you used sudo citadel init --provision, this configuration was applied automatically. For manual setups, install the NVIDIA Container Toolkit and restart Docker after updating the daemon configuration.

Ollama, LM Studio, and Extraction run without GPU access, though Ollama will use a GPU if one is available.

How It Works

Docker Compose files for each supported service are embedded in the Citadel binary using Go's embed package. When you run citadel run, the CLI extracts the appropriate compose file and invokes docker compose to manage the container lifecycle. This means you never need to manage compose files yourself -- everything is self-contained in the binary.