Skip to content

OpenShift - Kludges

Danger

The following solutions are NOT intended for production

Nvidia GPU operator stuck in notReady

OpenShift 4.18.22+

Error

Nvidia GPU ClusterPolicy stuck in notReady

This is a workaround for the following issue:

https://github.com/NVIDIA/gpu-operator/issues/1598

Bug

OCP 4.18.22: nvidia-operator-validator pod in Init:CreateContainerError - error executing hook /usr/local/nvidia/toolkit/nvidia-container-runtime-hook (exit code: 1)

Solution

YAML Source

Command Example

oc apply -f https://raw.githubusercontent.com/redhat-na-ssa/demo-ai-gitops-catalog/b042ba4c827a90b638625a4d017fe067745f64d2/dump/gpu-kludge-mcfg.yaml