llama.cpp avec Vulkan
-
Ajout pour la compilation :
# sudo apt install libvulkan-dev vulkan-tools glslang-tools cmake build-essential git -
Test :
# vulkaninfo | grep -i "deviceName" 'DISPLAY' environment variable not set... skipping surface info error: XDG_RUNTIME_DIR is invalid or not set in the environment. deviceName = Quadro M5000 deviceName = Quadro M4000 deviceName = llvmpipe (LLVM 20.1.2, 256 bits) -
Tentative de build :
# cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done CMAKE_BUILD_TYPE=Release -- Found Git: /usr/bin/git (found version "2.43.0") -- The ASM compiler identification is GNU -- Found assembler: /usr/bin/cc -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- GGML_SYSTEM_ARCH: x86 -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Including CPU backend -- x86 detected -- Adding CPU backend variant ggml-cpu: -march=native CMake Error at /usr/share/cmake-3.28/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find Vulkan (missing: glslc) (found version "1.3.275") Call Stack (most recent call first): /usr/share/cmake-3.28/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE) /usr/share/cmake-3.28/Modules/FindVulkan.cmake:600 (find_package_handle_standard_args) ggml/src/ggml-vulkan/CMakeLists.txt:9 (find_package) -- Configuring incomplete, errors occurred! -
Installation de SDK Vulkan :
# wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc # sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list http://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list # sudo apt update # sudo apt install vulkan-sdkMais erreur :
# sudo apt install vulkan-sdk Lecture des listes de paquets... Fait Construction de l'arbre des dépendances... Fait Lecture des informations d'état... Fait Certains paquets ne peuvent être installés. Ceci peut signifier que vous avez demandé l'impossible, ou bien, si vous utilisez la distribution unstable, que certains paquets n'ont pas encore été créés ou ne sont pas sortis d'Incoming. L'information suivante devrait vous aider à résoudre la situation : Les paquets suivants contiennent des dépendances non satisfaites : crashdiagnosticlayer : Dépend: libyaml-cpp0.7 (>= 0.7.0) mais il n'est pas installable E: Impossible de corriger les problèmes, des paquets défectueux sont en mode « garder en l'état ». -
La boulette j’ai pas pris la bonne version… on recommance :
rm /etc/apt/sources.list.d/lunarg-vulkan-jammy.list wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo tee /etc/apt/trusted.gpg.d/lunarg.asc sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list http://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list sudo apt update sudo apt install vulkan-sdk -
Nouveau build :
# cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release CMAKE_BUILD_TYPE=Release -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- GGML_SYSTEM_ARCH: x86 -- Including CPU backend -- x86 detected -- Adding CPU backend variant ggml-cpu: -march=native -- Found Vulkan: /usr/lib/x86_64-linux-gnu/libvulkan.so (found version "1.4.313") found components: glslc glslangValidator -- Vulkan found -- GL_KHR_cooperative_matrix supported by glslc -- GL_NV_cooperative_matrix2 supported by glslc -- GL_NV_cooperative_matrix_decode_vector not supported by glslc -- GL_EXT_integer_dot_product supported by glslc -- GL_EXT_bfloat16 supported by glslc -- Including Vulkan backend -- ggml version: 0.15.2 -- ggml commit: 5fd2dc2c4 -- Found OpenSSL: /usr/lib/x86_64-linux-gnu/libcrypto.so (found version "3.0.13") -- Performing Test OPENSSL_VERSION_SUPPORTED -- Performing Test OPENSSL_VERSION_SUPPORTED - Success -- OpenSSL found: 3.0.13 -- Generating embedded license file for target: llama-app -- Configuring done (5.0s) -- Generating done (0.6s) -
La commande pour le build :
# cmake --build build --config Release -j -
Petit test :
# make install # ldconfig -v # llama-bench -m /models/qwen2.5-1.5b-instruct-q4_k_m.gguf ggml_vulkan: Found 2 Vulkan devices: ggml_vulkan: 0 = Quadro M5000 (NVIDIA) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none ggml_vulkan: 1 = Quadro M4000 (NVIDIA) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | qwen2 1.5B Q4_K - Medium | 934.69 MiB | 1.54 B | Vulkan | -1 | pp512 | 53.48 ± 0.42 | | qwen2 1.5B Q4_K - Medium | 934.69 MiB | 1.54 B | Vulkan | -1 | tg128 | 63.55 ± 0.73 | build: 5fd2dc2c4 (9721) -
Arret de openwebui :
# systemctl stop openwebui # systemctl disable openwebui Removed "/etc/systemd/system/multi-user.target.wants/openwebui.service".Arret de ollama :
# systemctl stop ollama # systemctl disable ollama Removed "/etc/systemd/system/default.target.wants/ollama.service". -
Test en ligne de commande :
# llama-server -m /models/qwen2.5-1.5b-instruct-q4_k_m.gguf --host 0.0.0.0 -
Mon fichier service :
# systemctl status llama-server ● llama-server.service - Llama Server Loaded: loaded (/etc/systemd/system/llama-server.service; disabled; preset: enabled) Active: active (running) since Fri 2026-06-19 17:27:42 UTC; 29s ago Main PID: 37413 (llama-server) Tasks: 41 (limit: 94224) Memory: 91.7M (peak: 91.7M) CPU: 3.103s CGroup: /system.slice/llama-server.service └─37413 /usr/local/bin/llama-server --model /models/qwen2.5-1.5b-instruct-q4_k_m.gguf --host 0.0.0.0 --port 8080 juin 19 17:27:42 jellyfin systemd[1]: Started llama-server.service - Llama Server. root@jellyfin:/home/arias/llama.cpp/build# cat /etc/systemd/system/llama-server.service [Unit] Description=Llama Server After=network.target [Service] Type=simple User=root WorkingDirectory=/home/XXXX/llama.cpp Environment="NVM_BIN=/root/.nvm/versions/node/v26.3.1/bin" Environment="LD_LIBRARY_PATH=:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64" Environment="VULKAN_VERSION=1.4.350.1" ExecStart=/usr/local/bin/llama-server \ --model /models/qwen2.5-1.5b-instruct-q4_k_m.gguf \ --host 0.0.0.0 --port 8080 Restart=on-failure RestartSec=5s StandardOutput=file:/tmp/llama-server.stdout.log StandardError=file:/tmp/llama-server.stderr.log [Install] WantedBy=multi-user.target -
Le meilleur modèle semble être https://huggingface.co/Qwen/Qwen3.5-2B pour mes cartes.
Bonjour ! Vous semblez intéressé par cette conversation, mais vous n’avez pas encore de compte.
Marre de refaire défiler les mêmes messages ? Créez un compte pour retrouver votre position, recevoir des notifications des nouvelles réponses, sauvegarder vos favoris et voter pour les messages que vous appréciez.
Grâce à votre participation, ce message peut devenir encore meilleur 💗
S'inscrire Se connecter