Installation de machine avec RT Kernel et accélération graphique

Déploiement avec FAI-Project

https://fai-project.org/FAIme/ 

Your job BYDEGIKS is currently being processed. https://fai-project.org/myimages/BYDEGIKS

Your web config:

type="install"
partition="ONE"
desktop="XORG"
suite="ubuntu"
keyboard="fr"
addpkgs="software-properties-common terminator nano htop wget curl gpg apt-transport-https python3-rosdep2 ubuntu-realtime docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin"
email="robotics@hentz.eu"
rootpw=''
username="etudiant"
userpw=''
gittype=""
gituser=""
repo="deb [trusted=yes] http://download.docker.com/linux/ubuntu noble stable"
postinst="install_pc_aica.sh"
rclocal="1"
datapart=""
cmdline=""
options="STANDARD REBOOT RECOMMENDS"

curl "https://fai-project.org/cgi/faime.cgi?type=install;username=etudiant;partition=ONE;repo=http%3A%2F%2Fdownload.docker.com%2Flinux%2Fubuntu%20noble%20stable;keyboard=fr;suite=ubuntu;desktop=XORG;cl6=STANDARD;addpkgs=software-properties-common%20terminator%20nano%20htop%20wget%20curl%20gpg%20apt-transport-https%20python3-rosdep2%20ubuntu-realtime%20docker-ce%20docker-ce-cli%20containerd.io%20docker-buildx-plugin%20docker-compose-plugin;cl9=RECOMMENDS;email=robotics%40hentz.eu;postinst=install_pc_aica.sh;rclocal=1;cl8=REBOOT;sbm=2"

sudo apt install ubuntu-realtime
sudo nano /etc/default/grub
sudo update-grub
uname -v | cut -d" " -f1-4
sudo apt install cpufrequtils
sudo systemctl disable ondemand
sudo systemctl enable cpufrequtils
sudo sh -c 'echo "GOVERNOR=performance" > /etc/default/cpufrequtils'
sudo systemctl daemon-reload && sudo systemctl restart cpufrequtils
sudo systemctl status docker
cpufreq-info

wget <https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb>
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install nvidia-driver-pinning-570
sudo IGNORE_PREEMPT_RT_PRESENCE=1 apt install cuda-drivers

sudo apt-get update && sudo apt-get install -y --no-install-recommends    ca-certificates    curl    gnupg2
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL <https://download.docker.com/linux/ubuntu/gpg> -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
sudo tee /etc/apt/sources.list.d/docker.sources <<EOF
Types: deb
URIs: <https://download.docker.com/linux/ubuntu>
Suites: $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}")
Components: stable
Signed-By: /etc/apt/keyrings/docker.asc
EOF
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo docker run hello-world
sudo usermod -aG docker $USER

curl -fsSL <https://nvidia.github.io/libnvidia-container/gpgkey> | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg   && curl -s -L <https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list> |     sed 's#deb <https://#deb> [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] <https://#g>' |     sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo reboot
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

git clone <https://github.com/gautz/unistra-aica-practical> aica
cd ~/aica/package-template
docker build -f aica-package.toml -t object-detection-utils .

RT Kernel

RT Kernel ou Low Latency ?

https://unix.stackexchange.com/questions/553980/why-would-anyone-choose-not-to-use-the-lowlatency-kernel 

Soit on build à la main (https://innovation.iha.unistra.fr/books/robotique-open-source/page/commander-un-robot-ur-avec-le-driver-ros2#bkmrk-installation-d%27ubunt ), soit on prend un Kernel qui est dispo dans les dépôts

sudo apt list *realtime* linux*rt

ubuntu-realtime
linux-realtime
linux-image-6.8.1-1015-realtime
GRUB_SAVEDEFAULT=true
GRUB_DEFAULT="saved" # Le dernier Kernel choisi devient le Kernel par défaut
#GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.8.1-1015-realtime"
#GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=3 # 3 secondes pour choisir le Kernel à démarrer

https://unix.stackexchange.com/questions/198003/set-the-default-kernel-in-grub 

https://www.gnu.org/software/grub/manual/grub/html_node/Simple-configuration.html 

Accélération GPU

Que ce soit pour LeRobot, pour AICA, ou en général pour utiliser des modèles d'IA, l'installation de Linux avec les bons drivers graphiques n'est pas toujours aisé.

On veut accéder dans Docker aux capacités Compute d'un GPU, essentiellement fournie par NVidia Cuda. On différentiera l'installation d'un serveur qui n'a pas besoin de l'accélération graphique liée à l'affichage, ni de capacités temps-réel, contrairement à un PC de commande de robot.

NVidia Driver et RT Kernel

https://interfacinglinux.com/2024/01/16/nvidia-cuda-on-debian-real-time-kernel/ 

https://gist.github.com/pantor/9786c41c03a97bca7a52aa0a72fa9387 

Prérequis

sudo apt install nvidia-driver installe la dernière version du driver dispo, mais cuda-drivers n'est pas forcément dispo pour cette version. Donc on va chercher la bonne version.

Pour supprimer tous les drivers nvidia précédemment installés. A utiliser avec précautions : sudo apt autoremove --purge *nvidia*

Installation du nvidia-driver et de cuda

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-docker 

Optionnel

sudo apt -V install libnvidia-compute nvidia-dkms

sudo apt -V install libnvidia-gl nvidia-dkms

https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/ubuntu.html 

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#network-repo-installation-for-ubuntu 

Vérification de la compatibilité GPU - Application IA

Procédure :
On a une application d'IA donnée

Par exemple YOLO :

ONNX Runtime (ORT) qui sert à l'inférence des modèles Yolo sur GPU nécessitent des Compute Capability >= 6.0

Différentes versions de l'environnement d'exécution

https://docs.nvidia.com/deeplearning/cudnn/backend/latest/reference/support-matrix.html 

cuDNN>=v9.12.0 supporte :

cuDNN<=v9.11.1 

cuDNN<=v9.10.2 : 

Compatibilité de version d'Ubuntu

Sur Ubuntu 24.04 :

On a une carte graphique donnée :

Problèmes avec les GPU NVidia anciens : https://github.com/ultralytics/ultralytics/issues/5305 

GPU NVidia anciens / Legacy

On lance Yolo et on a l'erreur suivante : 

10:17:40.111 [yolo_executor] error: ONNX Runtime error during model query: Non-zero status code returned while running QuickGelu node. Name:'node_silu/QuickGeluFusion/' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device

ONNX Runtime (ORT) qui sert à l'inférence des modèles Yolo sur GPU nécessitent des Compute Capability > 5.2

Solution (voir analyse ci-dessous) :

https://github.com/microsoft/onnxruntime/issues/17861#issuecomment-1758098712 

On a une carte graphique donnée | par exemple Quadro M620 (GM107GLM) :

Option 1 : Driver Version: 580   CUDA Version: 13.0

Option 2 : Driver Version: 570   CUDA Version: 12.8

Option 3 : Driver Version: 535.288.01   CUDA Version: 12.2

Option 4 : Driver Version: 470   CUDA Version: 11.x

https://pytorch.org/get-started/previous-versions/

https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-892/support-matrix/index.html 

cuDNN=v8.9.2 : 

https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-897/support-matrix/index.html 

cuDNN=v8.9.7 : 

Voir aussi https://docs.nvidia.com/deeplearning/cudnn/onnxruntime-gpuarchives/index.html 

Avec :

#syntax=ghcr.io/aica-technology/app-builder:v2
# launch with bash: 
[core]
"image" = "v5.1.0"

[packages]
# add components
#"@aica/components/rl-policy-components" = "v3.0.0"
"@aica/components/advanced-perception" = "v1.0.0" # contains YoloExecutor
"@aica/components/core-vision" = "v1.1.2" # contains CameraStreamer
"@aica/foss/toolkits/cuda" = "v1.0.0-cuda24.12" # prerequisite for NVidia GPU acceleration
"@aica/foss/toolkits/ml" = "v1.0.0-gpu24.12" # prerequisite for ML model inference

# other extensions
"object-detection-utils" # self-built component to feed goal frame extracted from yolo to robot velocity controller

# add hardware collections
"@aica/collections/intel-realsense-collection" = "v2.0.1" # D4XX and L515 cameras models, stream RGBD
"@aica/collections/ur-collection" = "v4.3.0"

On a :

ros2@iha-portrob-1:~/examples$ python3 -m torch.utils.collect_env 
<frozen runpy>:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
Collecting environment information...
PyTorch version: 2.6.0+cu126
Is debug build: False
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.2 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Feb  4 2025, 14:48:35) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.17.0-14-generic-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: 12.6.85
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Quadro M620
Nvidia driver version: 570.211.01
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.9.6.0
/usr/lib/libcudnn_adv.so.9.6.0
/usr/lib/libcudnn_cnn.so.9.6.0
/usr/lib/libcudnn_engines_precompiled.so.9.6.0
/usr/lib/libcudnn_engines_runtime_compiled.so.9.6.0
/usr/lib/libcudnn_graph.so.9.6.0
/usr/lib/libcudnn_heuristic.so.9.6.0
/usr/lib/libcudnn_ops.so.9.6.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           39 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  4
On-line CPU(s) list:                     0-3
Vendor ID:                               GenuineIntel
Model name:                              Intel(R) Core(TM) i5-7440HQ CPU @ 2.80GHz
CPU family:                              6
Model:                                   158
Thread(s) per core:                      1
Core(s) per socket:                      4
Socket(s):                               1
Stepping:                                9
CPU(s) scaling MHz:                      84%
CPU max MHz:                             3800.0000
CPU min MHz:                             800.0000
BogoMIPS:                                5599.85
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_clear flush_l1d arch_capabilities
Virtualization:                          VT-x
L1d cache:                               128 KiB (4 instances)
L1i cache:                               128 KiB (4 instances)
L2 cache:                                1 MiB (4 instances)
L3 cache:                                6 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-3
Vulnerability Gather data sampling:      Vulnerable
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             KVM: Mitigation: Split huge pages
Vulnerability L1tf:                      Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
Vulnerability Mds:                       Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Meltdown:                  Mitigation; PTI
Vulnerability Mmio stale data:           Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Old microcode:             Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Mitigation; IBRS
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Mitigation; Microcode
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Mitigation; TSX disabled
Vulnerability Vmscape:                   Mitigation; IBPB before exit to userspace

Versions of relevant libraries:
[pip3] ament-flake8==0.17.2
[pip3] flake8==7.0.0
[pip3] flake8-builtins==2.1.0
[pip3] flake8-comprehensions==3.14.0
[pip3] flake8-docstrings==1.6.0
[pip3] flake8-import-order==0.18.2
[pip3] flake8-quotes==3.4.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] onnxruntime-gpu==1.22.2
[pip3] pytorch3d==0.7.8
[pip3] torch==2.6.0+cu126
[pip3] torchaudio==2.6.0+cu126
[pip3] torchvision==0.21.0+cu126
[pip3] triton==3.2.0
[conda] Could not collect

Avec :

"@aica/foss/toolkits/cuda" = "v1.0.0-cuda24.12" # prerequisite for NVidia GPU acceleration
"@aica/foss/toolkits/ml" = "v1.0.0-gpu24.12" # prerequisite for ML model inference

On a :

ros2@iha-portrob-1:~/examples$ python3 -m torch.utils.collect_env
<frozen runpy>:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
Collecting environment information...
PyTorch version: 2.6.0+cu126
Is debug build: False
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.2 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Feb  4 2025, 14:48:35) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.17.0-14-generic-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: 12.6.85
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Quadro M620
Nvidia driver version: 570.211.01
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.9.6.0
/usr/lib/libcudnn_adv.so.9.6.0
/usr/lib/libcudnn_cnn.so.9.6.0
/usr/lib/libcudnn_engines_precompiled.so.9.6.0
/usr/lib/libcudnn_engines_runtime_compiled.so.9.6.0
/usr/lib/libcudnn_graph.so.9.6.0
/usr/lib/libcudnn_heuristic.so.9.6.0
/usr/lib/libcudnn_ops.so.9.6.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           39 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  4
On-line CPU(s) list:                     0-3
Vendor ID:                               GenuineIntel
Model name:                              Intel(R) Core(TM) i5-7440HQ CPU @ 2.80GHz
CPU family:                              6
Model:                                   158
Thread(s) per core:                      1
Core(s) per socket:                      4
Socket(s):                               1
Stepping:                                9
CPU(s) scaling MHz:                      81%
CPU max MHz:                             3800.0000
CPU min MHz:                             800.0000
BogoMIPS:                                5599.85
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_clear flush_l1d arch_capabilities
Virtualization:                          VT-x
L1d cache:                               128 KiB (4 instances)
L1i cache:                               128 KiB (4 instances)
L2 cache:                                1 MiB (4 instances)
L3 cache:                                6 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-3
Vulnerability Gather data sampling:      Vulnerable
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             KVM: Mitigation: Split huge pages
Vulnerability L1tf:                      Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
Vulnerability Mds:                       Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Meltdown:                  Mitigation; PTI
Vulnerability Mmio stale data:           Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Old microcode:             Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Mitigation; IBRS
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Mitigation; Microcode
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Mitigation; TSX disabled
Vulnerability Vmscape:                   Mitigation; IBPB before exit to userspace

Versions of relevant libraries:
[pip3] ament-flake8==0.17.2
[pip3] flake8==7.0.0
[pip3] flake8-builtins==2.1.0
[pip3] flake8-comprehensions==3.14.0
[pip3] flake8-docstrings==1.6.0
[pip3] flake8-import-order==0.18.2
[pip3] flake8-quotes==3.4.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] onnxruntime-gpu==1.22.2
[pip3] pytorch3d==0.7.8
[pip3] torch==2.6.0+cu126
[pip3] torchaudio==2.6.0+cu126
[pip3] torchvision==0.21.0+cu126
[pip3] triton==3.2.0
[conda] Could not collect

Sources

https://stackoverflow.com/questions/30820513/what-is-the-correct-version-of-cuda-for-my-nvidia-driver/30820690#30820690 

https://interfacinglinux.com/2024/01/16/nvidia-cuda-on-debian-real-time-kernel/ 

https://forum.zorin.com/t/nvidia-drivers-cannot-be-installed-because-of-real-time-kernel/46419/22 


Revision #31
Created 20 February 2026 16:36:48 by admin_idf
Updated 13 April 2026 14:18:47 by admin_idf