Getting started with MPS partitioning
Warning
Multi-Process Service (MPS) is supported only by NVIDIA GPUs based on Volta and newer architectures.
Prerequisites
- you need the Nebuly k8s-device-plugin installed on your cluster
Enable automatic partitioning
You can enable automatic MPS partitioning on a node by adding to it the following label:
The label delegates to nos
the management of the MPS resources of all the GPUs of that node. You just have to create submit your Pods to the cluster and the requested MPS resources are automatically provisioned.
Create pods requesting MPS resources
You can make your pods request slices of GPU by specifying MPS resources in their containers requests. MPS devices are exposed by our k8s-device-plugin using the following naming convention: nvidia.com/gpu-<size>gb
, where <size>
corresponds to the GB of memory of the GPU slice. The computing resources are instead equally shared among all its MPS resources.
You can specify any size you want, but you should keep in mind that the GPU Partitioner will create an MPS resource on a certain GPU only if its size is smaller or equal than the total amount of memory of that GPU (which is indicated by the node label nvidia.com/gpu.memory
applied by the NVIDIA GPU Operator).
For instance, you can create a pod requesting a slice of a 10GB of GPU memory as follows:
$ kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: mps-partitioning-example
spec:
hostIPC: true # (2)
securityContext:
runAsUser: 1000 # (3)
containers:
- name: sleepy
image: "busybox:latest"
command: ["sleep", "120"]
resources:
limits:
nvidia.com/gpu-10gb: 1 # (1)
EOF
- Fraction of GPU with 10 GB of memory
hostIPC
must be set to true- Containers must run as the same user as the MPS Server
Pods requesting MPS resources must meet two requirements:
hostIPC
must be set totrue
in order to allow containers to access the IPC namespace of the host- Containers must run as the same user as the user running the MPS server on the host, which is
1000
by default
The two requirements above are due to how MPS works. Since it requires the clients and the server to share the same memory space, we need to allow the pods to access the host IPC namespace so that it can communicate with the MPS server running on it. Moreover, the MPS server accepts only connections from clients running as the same user as the server, which is 1000
by default (you can change it by setting the mps.userID
value when installing the k8s-device-plugin chart), so the containers of your pods must run with the same user if they request MPS resources.
Note
Containers are supposed to request at most one MPS device. If a container needs more resources, then it should ask for a larger, single device as opposed to multiple smaller devices
Warning
If you run nvidia-smi
inside a container, the output still shows the whole memory of the GPU.
Nevertheless, processes inside the container are able to allocate only the amount of memory requested by the contaner.
You can check the availble GPU memory through the environment variable CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
.