This tutorial explains how to configure a bare-metal Kubernetes (K8s) cluster for GPU orchestration. By integrating the NVIDIA Container Toolkit and the Kubernetes Device Plugin, you can automatically schedule, allocate, and manage GPU resources across your containerized workloads.
Prerequisites
Before beginning, ensure your environment meets the following requirements:
- Operating System: Ubuntu 22.04 LTS (Jammy Jellyfish).
- Hardware: A bare-metal server with at least one physical NVIDIA GPU attached.
- Access: Root or sudo privileges.
- Kubernetes: A running K8s cluster (v1.25+) initialized via kubeadm, k3s, or similar, with the kubectl CLI tool configured.
- Container Runtime: containerd installed and running.
Quick Summary
If you need a quick overview of the deployment pipeline:
- Update the Host: Install the proprietary NVIDIA GPU drivers directly on the bare-metal node.
- Install Toolkit: Deploy the NVIDIA Container Toolkit to bridge the GPU with container runtimes.
- Configure Runtime: Modify containerd configurations to recognize the nvidia runtime class.
- Deploy Plugin: Apply the NVIDIA Device Plugin DaemonSet to your K8s cluster.
- Verify: Deploy a test Pod requesting nvidia.com/gpu resources to confirm successful orchestration.
Step 1: Install NVIDIA Drivers on the Host Node
Kubernetes cannot interact with the GPU hardware without the host machine first having the correct drivers installed.
Update your package lists and install necessary build tools:
sudo apt-get update
sudo apt-get install -y build-essential linux-headers-$(uname -r)
Install the recommended NVIDIA driver for your hardware:
sudo apt-get install -y nvidia-driver-535
Reboot the server. Once back online, verify the installation by checking the GPU status:
nvidia-smi
Step 2: Install the NVIDIA Container Toolkit
The NVIDIA Container Toolkit allows containerd to pass GPU access directly to containers.
Setup the package repository and GPG key:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Update the repository and install the toolkit:
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
Step 3: Configure containerd for GPU Support
You must explicitly tell containerd to use the NVIDIA runtime so Kubernetes can properly launch GPU-enabled Pods.
Pro Tip: Configuring container runtimes and compiling drivers on inconsistent hardware can lead to frustrating kernel panics. Starting with a standardized environment—like a pre-configured GPUYard Bare Metal Dedicated Server—ensures you have the unthrottled PCIe lanes and clean OS images necessary to skip hardware debugging and move straight to orchestrating your AI workloads.
Configure the NVIDIA runtime in containerd:
sudo nvidia-ctk runtime configure --runtime=containerd
Open the configuration file to ensure SystemdCgroup = true is set, which is required by modern Kubernetes:
sudo nano /etc/containerd/config.toml
Restart containerd to apply the changes:
sudo systemctl restart containerd
Step 4: Deploy the NVIDIA Device Plugin for Kubernetes
The NVIDIA Device Plugin runs as a DaemonSet across your cluster. It constantly monitors the node's GPU capacity and exposes it to the kubelet, allowing the Kubernetes scheduler to track available GPUs.
Apply the official NVIDIA Device Plugin YAML from your master node:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.4/nvidia-device-plugin.yml
Verify that the DaemonSet pods are running securely:
kubectl get pods -n kube-system -l name=nvidia-device-plugin-ds
Check if your node is correctly advertising GPU capacity:
kubectl describe node | grep -i nvidia.com/gpu
You should see an output indicating the exact number of GPUs available for allocation.
Step 5: Test GPU Allocation with a Pod
Finally, deploy a test workload to ensure the Kubernetes scheduler successfully grants GPU access to a container.
Create a file named gpu-pod.yaml:
apiVersion: v1
kind: Pod
metadata:
name: gpu-test-pod
spec:
restartPolicy: OnFailure
containers:
- name: cuda-container
image: nvidia/cuda:12.2.0-base-ubuntu22.04
command: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: 1
Apply the configuration:
kubectl apply -f gpu-pod.yaml
Check the Pod's logs to confirm it executed nvidia-smi successfully from inside the K8s cluster:
kubectl logs gpu-test-pod
FAQ & Troubleshooting
- Q: My Pod is stuck in the Pending state. Why?
A: This typically means the Kubernetes scheduler cannot find a node with available nvidia.com/gpu resources. Checkkubectl describe pod <pod-name>to look for Insufficient nvidia.com/gpu events. Ensure the NVIDIA Device Plugin is running and your node is registering the hardware. - Q: The NVIDIA Device Plugin Pod is crashing with a CrashLoopBackOff error.
A: This happens if the plugin cannot communicate with the container runtime. Verify that containerd was successfully restarted after running thenvidia-ctk runtime configurecommand in Step 3. - Q: Can I share a single GPU among multiple Pods?
A: By default, Kubernetes allocates one physical GPU exclusively to one container. To slice a single GPU for multiple pods, you must configure NVIDIA Multi-Instance GPU (MIG) on supported hardware (like the A100 or H100) or utilize NVIDIA Time-Slicing configurations in the device plugin.
Conclusion
You have successfully configured a bare-metal Kubernetes environment to recognize, manage, and allocate NVIDIA GPUs. By laying down the host drivers, linking containerd via the NVIDIA Container Toolkit, and orchestrating it all with the K8s Device Plugin, your cluster is now ready to handle intensive AI inference and ML training workloads with zero virtualization overhead.
For enterprise-grade reliability and uncompromised raw computing power, consider deploying your next Kubernetes cluster on GPUYard. Explore our high-performance Bare Metal Dedicated Servers to build a resilient, scalable, and highly available infrastructure tailored specifically for AI orchestration.