llama.cpp : Installation

farias

Les commandes :

Ma version de Ubuntu :

# inxi -F
System:
  Host: XXXX Kernel: 6.8.0-117-generic arch: x86_64 bits: 64
  Console: pty pts/1 Distro: Ubuntu 24.04.4 LTS (Noble Numbat)
Machine:
  Type: Kvm System: QEMU product: Standard PC (i440FX + PIIX, 1996) v: pc-i440fx-10.1 serial: N/A
  Mobo: N/A model: N/A serial: N/A BIOS: SeaBIOS v: rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org
    date: 04/01/2014
CPU:
  Info: 4x 8-core model: Intel Xeon E5-2450 v2 bits: 64 type: MCP SMP cache:
    L2: 4x 32 MiB (128 MiB)
  Speed (MHz): avg: 2500 min/max: N/A cores: 1: 2500 2: 2500 3: 2500 4: 2500 5: 2500 6: 2500
    7: 2500 8: 2500 9: 2500 10: 2500 11: 2500 12: 2500 13: 2500 14: 2500 15: 2500 16: 2500 17: 2500
    18: 2500 19: 2500 20: 2500 21: 2500 22: 2500 23: 2500 24: 2500 25: 2500 26: 2500 27: 2500
    28: 2500 29: 2500 30: 2500 31: 2500 32: 2500
Graphics:
  Device-1: driver: bochs-drm v: N/A
  Device-2: NVIDIA GM204GL [Quadro M5000] driver: nvidia v: 580.159.03
  Device-3: NVIDIA GM204GL [Quadro M4000] driver: nvidia v: 580.159.03
  Display: server: X.org v: 1.21.1.11 driver: gpu: bochs-drm tty: 213x51 resolution: 1280x800
  API: EGL v: 1.5 drivers: swrast platforms: surfaceless,device
  API: OpenGL v: 4.5 vendor: mesa v: 25.2.8-0ubuntu0.24.04.2 note: console (EGL sourced)
    renderer: llvmpipe (LLVM 20.1.2 256 bits)
Audio:
  Message: No device data found.
Network:
  Device-1: Intel 82371AB/EB/MB PIIX4 ACPI type: network bridge driver: piix4_smbus
  Device-2: Red Hat Virtio network driver: virtio-pci
  IF: ens18 state: up speed: -1 duplex: unknown mac: bc:24:11:b5:30:62
Drives:
  Local Storage: total: 20.51 TiB used: 13.02 TiB (63.5%)
  ID-1: /dev/sda vendor: QEMU model: HARDDISK size: 400 GiB
  ID-2: /dev/sdb model: Portable SSD T5 size: 931.51 GiB type: USB
  ID-3: /dev/sdc vendor: Seagate model: FireCuda HDD size: 4.55 TiB type: USB
  ID-4: /dev/sdd vendor: Seagate model: Expansion Desk size: 14.55 TiB type: USB
  ID-5: /dev/sde vendor: Kingston model: SA400S37120G size: 111.79 GiB type: USB
Partition:
  ID-1: / size: 391.18 GiB used: 292.79 GiB (74.8%) fs: ext4 dev: /dev/dm-0
  ID-2: /boot size: 1.9 GiB used: 345.4 MiB (17.7%) fs: ext4 dev: /dev/sda2
Swap:
  ID-1: swap-1 type: file size: 4 GiB used: 8.5 MiB (0.2%) file: /swap.img
Sensors:
  Src: lm-sensors+/sys Message: No sensor data found using /sys/class/hwmon or lm-sensors.
Info:
  Memory: total: 78.19 GiB available: 18.14 GiB used: 2.62 GiB (14.4%)
  Processes: 416 Uptime: 24d 19h 4m Init: systemd target: graphical (5) Shell: Bash inxi: 3.3.34

Mise à jours de CUDA :

# apt-get update
# apt-get upgrade
# ubuntu-drivers autoinstall
# apt install nvidia-cuda-toolkit -y
# echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

Aie … on refait :

# wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-ubuntu2404.pin
# mv cuda-ubuntu2404.pin /etc/apt/preferences.d/cuda-repository-pin-600
# wget https://developer.download.nvidia.com/compute/cuda/13.3.0/local_installers/cuda-repo-ubuntu2404-13-3-local_13.3.0-610.43.02-1_amd64.deb
# dpkg -i cuda-repo-ubuntu2404-13-3-local_13.3.0-610.43.02-1_amd64.deb
# cp /var/cuda-repo-ubuntu2404-13-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
# apt-get update
# apt-get -y install cuda-toolkit-13-3
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2026 NVIDIA Corporation
Built on Fri_Apr_24_07:22:02_PM_PDT_2026
Cuda compilation tools, release 13.3, V13.3.33
Build cuda_13.3.r13.3/compiler.37862127_0
# apt-get autoremove
# ldconfig -v

La page pour le téléchargement de cuda : https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_local

Installation nvm :

# curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash
...
# source ~/.profile
...
# nvm install node
...
# nvm install 24
...
# nvm use 24
Now using node v24.17.0 (npm v11.13.0)

Installation de la libnccl :

# apt install libnccl2 libnccl-dev

Installation llama.ccp :

# git clone https://github.com/ggml-org/llama.cpp.git
# cd llama.cpp
# mkdir build
# cd build
# cmake ..
# cmake --build . --config Release
# make install

Chargement des modèles :

# mkdir /models
# curl -L --fail -o /models/qwen2.5-1.5b-instruct-q4_k_m.gguf   https://huggingface.co/bartowski/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf

Benchmark ( sans CUDA, seulement CPU ) :

# llama-bench -m  /models/qwen2.5-1.5b-instruct-q4_k_m.gguf
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen2 1.5B Q4_K - Medium       | 934.69 MiB |     1.54 B | CPU        |      32 |           pp512 |         74.97 ± 4.53 |

C’est vraiment mauvais …

farias

Et comme toujours perte des drivers pour NVIDIA :

# cd
root@jellyfin:~# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.309
# apt-get purge nvidia-*
# ubuntu-drivers install
# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.309
# reboot

farias

Visiblement pas possible de mettre une version à jours :

root@jellyfin:/home/arias# nvidia-smi
Fri Jun 19 10:10:52 2026       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.309.01             Driver Version: 535.309.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro M5000                   Off | 00000000:00:10.0 Off |                  Off |
| 40%   48C    P8              14W / 150W |      4MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Quadro M4000                   Off | 00000000:00:1B.0 Off |                  N/A |
| 50%   52C    P8              14W / 120W |      4MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:00:10.0

farias

Je teste la compilation CUDA :

# cd llama.cpp
# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
# export PATH=$PATH:$CUDA_HOME/bin
# cmake -B build -DGGML_CUDA=ON  -DCMAKE_CUDA_COMPILER=`which nvcc`
...
# cmake --build build --config Release -j 20
[  8%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o
[ 10%] Built target ggml-cpu
nvcc fatal   : Unsupported gpu architecture 'compute_52'

farias

Le GPU est trop ancien : https://en.wikipedia.org/wiki/CUDA#Supported_GPUs

farias

Mise à jours de Ollama :

# curl -fsSL https://ollama.com/install.sh | sh
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading ollama-linux-amd64.tar.zst
######################################################################## 100.0%
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
>>> NVIDIA GPU installed.

Visiblement même problème :

...
juin 19 10:51:37  ollama[2964]: time=2026-06-19T10:51:37.153Z level=INFO source=model_list_cache.go:111 msg="model list cache hydration complete" models=16 failures=0 elapsed=654.370427ms
juin 19 10:51:42  ollama[2964]: time=2026-06-19T10:51:42.591Z level=WARN source=cuda_compat.go:38 msg="NVIDIA driver too old" device="Quadro M5000" compute=5.2 driver=535 required_driver="570 or newer"
juin 19 10:51:42  ollama[2964]: time=2026-06-19T10:51:42.591Z level=WARN source=cuda_compat.go:38 msg="NVIDIA driver too old" device="Quadro M4000" compute=5.2 driver=535 required_driver="570 or newer"
juin 19 10:51:43  ollama[2964]: time=2026-06-19T10:51:43.181Z level=INFO source=types.go:32 msg="inference compute" id=1 filter_id=1 library=Vulkan compute=0.0 name=Vulkan1 description="Quadro M4000" libd>
juin 19 10:51:43  ollama[2964]: time=2026-06-19T10:51:43.181Z level=INFO source=types.go:32 msg="inference compute" id=0 filter_id=0 library=Vulkan compute=0.0 name=Vulkan0 description="Quadro M5000" libd>
...

NodeBB

llama.cpp : Installation