llama.cpp : Installation
-
Les commandes :
Ma version de Ubuntu :
# inxi -F System: Host: XXXX Kernel: 6.8.0-117-generic arch: x86_64 bits: 64 Console: pty pts/1 Distro: Ubuntu 24.04.4 LTS (Noble Numbat) Machine: Type: Kvm System: QEMU product: Standard PC (i440FX + PIIX, 1996) v: pc-i440fx-10.1 serial: N/A Mobo: N/A model: N/A serial: N/A BIOS: SeaBIOS v: rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org date: 04/01/2014 CPU: Info: 4x 8-core model: Intel Xeon E5-2450 v2 bits: 64 type: MCP SMP cache: L2: 4x 32 MiB (128 MiB) Speed (MHz): avg: 2500 min/max: N/A cores: 1: 2500 2: 2500 3: 2500 4: 2500 5: 2500 6: 2500 7: 2500 8: 2500 9: 2500 10: 2500 11: 2500 12: 2500 13: 2500 14: 2500 15: 2500 16: 2500 17: 2500 18: 2500 19: 2500 20: 2500 21: 2500 22: 2500 23: 2500 24: 2500 25: 2500 26: 2500 27: 2500 28: 2500 29: 2500 30: 2500 31: 2500 32: 2500 Graphics: Device-1: driver: bochs-drm v: N/A Device-2: NVIDIA GM204GL [Quadro M5000] driver: nvidia v: 580.159.03 Device-3: NVIDIA GM204GL [Quadro M4000] driver: nvidia v: 580.159.03 Display: server: X.org v: 1.21.1.11 driver: gpu: bochs-drm tty: 213x51 resolution: 1280x800 API: EGL v: 1.5 drivers: swrast platforms: surfaceless,device API: OpenGL v: 4.5 vendor: mesa v: 25.2.8-0ubuntu0.24.04.2 note: console (EGL sourced) renderer: llvmpipe (LLVM 20.1.2 256 bits) Audio: Message: No device data found. Network: Device-1: Intel 82371AB/EB/MB PIIX4 ACPI type: network bridge driver: piix4_smbus Device-2: Red Hat Virtio network driver: virtio-pci IF: ens18 state: up speed: -1 duplex: unknown mac: bc:24:11:b5:30:62 Drives: Local Storage: total: 20.51 TiB used: 13.02 TiB (63.5%) ID-1: /dev/sda vendor: QEMU model: HARDDISK size: 400 GiB ID-2: /dev/sdb model: Portable SSD T5 size: 931.51 GiB type: USB ID-3: /dev/sdc vendor: Seagate model: FireCuda HDD size: 4.55 TiB type: USB ID-4: /dev/sdd vendor: Seagate model: Expansion Desk size: 14.55 TiB type: USB ID-5: /dev/sde vendor: Kingston model: SA400S37120G size: 111.79 GiB type: USB Partition: ID-1: / size: 391.18 GiB used: 292.79 GiB (74.8%) fs: ext4 dev: /dev/dm-0 ID-2: /boot size: 1.9 GiB used: 345.4 MiB (17.7%) fs: ext4 dev: /dev/sda2 Swap: ID-1: swap-1 type: file size: 4 GiB used: 8.5 MiB (0.2%) file: /swap.img Sensors: Src: lm-sensors+/sys Message: No sensor data found using /sys/class/hwmon or lm-sensors. Info: Memory: total: 78.19 GiB available: 18.14 GiB used: 2.62 GiB (14.4%) Processes: 416 Uptime: 24d 19h 4m Init: systemd target: graphical (5) Shell: Bash inxi: 3.3.34Mise à jours de CUDA :
# apt-get update # apt-get upgrade # ubuntu-drivers autoinstall # apt install nvidia-cuda-toolkit -y # echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf # nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Fri_Feb_21_20:23:50_PST_2025 Cuda compilation tools, release 12.8, V12.8.93 Build cuda_12.8.r12.8/compiler.35583870_0Aie … on refait :
# wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-ubuntu2404.pin # mv cuda-ubuntu2404.pin /etc/apt/preferences.d/cuda-repository-pin-600 # wget https://developer.download.nvidia.com/compute/cuda/13.3.0/local_installers/cuda-repo-ubuntu2404-13-3-local_13.3.0-610.43.02-1_amd64.deb # dpkg -i cuda-repo-ubuntu2404-13-3-local_13.3.0-610.43.02-1_amd64.deb # cp /var/cuda-repo-ubuntu2404-13-3-local/cuda-*-keyring.gpg /usr/share/keyrings/ # apt-get update # apt-get -y install cuda-toolkit-13-3 # nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2026 NVIDIA Corporation Built on Fri_Apr_24_07:22:02_PM_PDT_2026 Cuda compilation tools, release 13.3, V13.3.33 Build cuda_13.3.r13.3/compiler.37862127_0 # apt-get autoremove # ldconfig -vLa page pour le téléchargement de cuda : https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_local
Installation nvm :
# curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash ... # source ~/.profile ... # nvm install node ... # nvm install 24 ... # nvm use 24 Now using node v24.17.0 (npm v11.13.0)Installation de la libnccl :
# apt install libnccl2 libnccl-devInstallation llama.ccp :
# git clone https://github.com/ggml-org/llama.cpp.git # cd llama.cpp # mkdir build # cd build # cmake .. # cmake --build . --config Release # make installChargement des modèles :
# mkdir /models # curl -L --fail -o /models/qwen2.5-1.5b-instruct-q4_k_m.gguf https://huggingface.co/bartowski/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/Qwen2.5-1.5B-Instruct-Q4_K_M.ggufBenchmark ( sans CUDA, seulement CPU ) :
# llama-bench -m /models/qwen2.5-1.5b-instruct-q4_k_m.gguf | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | qwen2 1.5B Q4_K - Medium | 934.69 MiB | 1.54 B | CPU | 32 | pp512 | 74.97 ± 4.53 |C’est vraiment mauvais …
-
Et comme toujours perte des drivers pour NVIDIA :
# cd root@jellyfin:~# nvidia-smi Failed to initialize NVML: Driver/library version mismatch NVML library version: 535.309 # apt-get purge nvidia-* # ubuntu-drivers install # nvidia-smi Failed to initialize NVML: Driver/library version mismatch NVML library version: 535.309 # reboot -
Visiblement pas possible de mettre une version à jours :
root@jellyfin:/home/arias# nvidia-smi Fri Jun 19 10:10:52 2026 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.309.01 Driver Version: 535.309.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Quadro M5000 Off | 00000000:00:10.0 Off | Off | | 40% 48C P8 14W / 150W | 4MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Quadro M4000 Off | 00000000:00:1B.0 Off | N/A | | 50% 52C P8 14W / 120W | 4MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ WARNING: infoROM is corrupted at gpu 0000:00:10.0 -
Je teste la compilation CUDA :
# cd llama.cpp # export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 # export PATH=$PATH:$CUDA_HOME/bin # cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=`which nvcc` ... # cmake --build build --config Release -j 20 [ 8%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o [ 10%] Built target ggml-cpu nvcc fatal : Unsupported gpu architecture 'compute_52' -
Le GPU est trop ancien : https://en.wikipedia.org/wiki/CUDA#Supported_GPUs
-
Mise à jours de Ollama :
# curl -fsSL https://ollama.com/install.sh | sh >>> Cleaning up old version at /usr/local/lib/ollama >>> Installing ollama to /usr/local >>> Downloading ollama-linux-amd64.tar.zst ######################################################################## 100.0% >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... >>> NVIDIA GPU installed.Visiblement même problème :
... juin 19 10:51:37 ollama[2964]: time=2026-06-19T10:51:37.153Z level=INFO source=model_list_cache.go:111 msg="model list cache hydration complete" models=16 failures=0 elapsed=654.370427ms juin 19 10:51:42 ollama[2964]: time=2026-06-19T10:51:42.591Z level=WARN source=cuda_compat.go:38 msg="NVIDIA driver too old" device="Quadro M5000" compute=5.2 driver=535 required_driver="570 or newer" juin 19 10:51:42 ollama[2964]: time=2026-06-19T10:51:42.591Z level=WARN source=cuda_compat.go:38 msg="NVIDIA driver too old" device="Quadro M4000" compute=5.2 driver=535 required_driver="570 or newer" juin 19 10:51:43 ollama[2964]: time=2026-06-19T10:51:43.181Z level=INFO source=types.go:32 msg="inference compute" id=1 filter_id=1 library=Vulkan compute=0.0 name=Vulkan1 description="Quadro M4000" libd> juin 19 10:51:43 ollama[2964]: time=2026-06-19T10:51:43.181Z level=INFO source=types.go:32 msg="inference compute" id=0 filter_id=0 library=Vulkan compute=0.0 name=Vulkan0 description="Quadro M5000" libd> ...
Bonjour ! Vous semblez intéressé par cette conversation, mais vous n’avez pas encore de compte.
Marre de refaire défiler les mêmes messages ? Créez un compte pour retrouver votre position, recevoir des notifications des nouvelles réponses, sauvegarder vos favoris et voter pour les messages que vous appréciez.
Grâce à votre participation, ce message peut devenir encore meilleur 💗
S'inscrire Se connecter