Nvidia GPU Demos
AWS GPU Notes
Availability Zones / Instance Types
AWS type p4d.24xlarge
is currently only in availability zone us-east-2b
and has 96 vCPU.
If your cluster does not have a machine set in us-east-2b
you
will probably not be able to request this GPU type.
Nvidia Multi Instance GPU (MIG) configuration on OpenShift
Red Hat Demo Platform (RHDP) Options
Note
The node sizes below are the recommended minimum to select for provisioning
- AWS with OpenShift Open Environment
- 1 x Control Plane -
m6a.2xlarge
- 0 x Workers -
m6a.2xlarge
- 1 x Control Plane -
Warning
MIG demo is currently a WIP for RHDP and will likely NOT work
Error message
error launching instance: You have requested more vCPU capacity than your
current vCPU limit of 64 allows for the instance bucket that the specified
instance type belongs to.
Prerequisites
- Nvidia GPU hardware
- A100
- H100
- A30
Quickstart
Setup MIG single mode.
- Type:
p4d.24xlarge
= 8 x GPUs - Profile: 1 GPU and 5GB of memory
- Resource:
nvidia.com/gpu: 1
. scripts/functions.sh
ocp_nvidia_mig_config_setup single all-1g.5gb
Nvidia MIG profiles
Setup MIG profile
. scripts/functions.sh
# setup MIG single
# ex: nvidia.com/gpu: 1
ocp_nvidia_mig_config_setup single all-1g.5gb
ocp_nvidia_mig_config_setup single all-2g.10gb
# setup MIG mixed
# ex: nvidia.com/mig-2g.10gb: 1
ocp_nvidia_mig_config_setup mixed all-balanced
Manually Pick MIG profile
# mode = single / mixed
MIG_CONFIG=all-1g.5gb
MIG_CONFIG=all-2g.10gb
# mode = mixed
MIG_CONFIG=all-balanced
Manually apply MIG partitioning profile(s) - in mixed mode
# add profile label to `gpu` labeled node
oc label node --overwrite \
-l "node-role.kubernetes.io/gpu" \
"nvidia.com/mig.config=${MIG_CONFIG}"
# remove profile label
oc label node --overwrite \
-l "node-role.kubernetes.io/gpu" \
"nvidia.com/mig.config-"