<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Benchmark llama.cpp sur Tuxedo 17]]></title><description><![CDATA[<p dir="auto">Le hardware :</p>
<pre><code>$ inxi -F
System:
  Host: tuxedo17 Kernel: 6.8.0-124-generic arch: x86_64 bits: 64
    Desktop: Budgie v: 10.6.1 Distro: Ubuntu 22.04.5 LTS (Jammy Jellyfish)
Machine:
  Type: Laptop System: TUXEDO product: TUXEDO Polaris Intel Gen3 (TGL)
    v: Standard serial: &lt;superuser required&gt;
  Mobo: NB02 model: GMxTGxx v: Standard serial: &lt;superuser required&gt;
    UEFI: American Megatrends LLC. v: N.1.07A03 date: 09/24/2021
Battery:
  ID-1: BAT0 charge: 62.3 Wh (100.0%) condition: 62.3/62.3 Wh (100.0%)
CPU:
  Info: 8-core model: 11th Gen Intel Core i7-11800H bits: 64 type: MT MCP
    cache: L2: 10 MiB
  Speed (MHz): avg: 1456 min/max: 800/4600 cores: 1: 1330 2: 801 3: 1052
    4: 800 5: 800 6: 1036 7: 800 8: 800 9: 2211 10: 800 11: 800 12: 4535 13: 800
    14: 2614 15: 800 16: 3320
Graphics:
  Device-1: Intel TigerLake-H GT1 [UHD Graphics] driver: i915 v: kernel
  Device-2: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q] driver: nvidia
    v: 560.35.03
  Device-3: Chicony HD Webcam driver: uvcvideo type: USB
  Display: x11 server: X.Org v: 1.21.1.4 with: Xwayland v: 22.1.1 driver: X:
    loaded: modesetting,nvidia unloaded: fbdev,nouveau,vesa dri: iris gpu: i915
    resolution: 1920x1080~144Hz
  API: EGL v: 1.5 drivers: nvidia platforms: x11,device
  API: OpenGL v: 4.6.0 vendor: nvidia v: 560.35.03 renderer: NVIDIA GeForce
    RTX 3060 Laptop GPU/PCIe/SSE2
</code></pre>
<p dir="auto">Le build par défault ( donc sans CUDA )  de llama.cpp :</p>
<pre><code class="language-bash"># git clone https://github.com/ggml-org/llama.cpp.git
...
# cmake --build . --config Release
...
</code></pre>
<p dir="auto">Le résultat :</p>
<pre><code># llama-bench -m ~/.cache/huggingface/hub/models--ggml-org--gemma-3-1b-it-GGUF/snapshots/f9c28bcd85737ffc5aef028638d3341d49869c27/gemma-3-1b-it-Q4_K_M.gguf 
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | CPU        |       8 |           pp512 |       322.52 ± 13.60 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | CPU        |       8 |           tg128 |         48.07 ± 0.48 |

build: b4024af6c (9687)
</code></pre>
]]></description><link>https://lemmy.cyber-neurones.org/topic/365/benchmark-llama.cpp-sur-tuxedo-17</link><generator>RSS for Node</generator><lastBuildDate>Mon, 22 Jun 2026 16:12:35 GMT</lastBuildDate><atom:link href="https://lemmy.cyber-neurones.org/topic/365.rss" rel="self" type="application/rss+xml"/><pubDate>Wed, 17 Jun 2026 15:19:24 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Thu, 18 Jun 2026 09:35:37 GMT]]></title><description><![CDATA[<p dir="auto">Test :</p>
<pre><code class="language-bash">$ curl -fsS http://localhost:8080/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "qwen2.5-1.5b",
    "messages": [
      {"role": "system", "content": "Tu es un assistant DevOps qui répond en français en une phrase."},
      {"role": "user", "content": "Quelle commande systemd affiche les services actifs ?"}
    ],
    "temperature": 0.2,
    "max_tokens": 80
  }' | jq
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "systemctl list-units --active"
      }
    }
  ],
  "created": 1781775300,
  "model": "qwen2.5-1.5b",
  "system_fingerprint": "b9687-b4024af6c",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 8,
    "prompt_tokens": 38,
    "total_tokens": 46,
    "prompt_tokens_details": {
      "cached_tokens": 37
    }
  },
  "id": "chatcmpl-mXP4t0noHYBKTaBSnZfxFdoTycVt42zC",
  "timings": {
    "cache_n": 37,
    "prompt_n": 1,
    "prompt_ms": 21.161,
    "prompt_per_token_ms": 21.161,
    "prompt_per_second": 47.25674590047729,
    "predicted_n": 8,
    "predicted_ms": 40.424,
    "predicted_per_token_ms": 5.053,
    "predicted_per_second": 197.90223629527014
  }
}

</code></pre>
]]></description><link>https://lemmy.cyber-neurones.org/post/906</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/906</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Thu, 18 Jun 2026 09:35:37 GMT</pubDate></item><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Thu, 18 Jun 2026 09:32:55 GMT]]></title><description><![CDATA[<p dir="auto">Nouveau test :</p>
<pre><code class="language-bash"># curl -L --fail -o /models/qwen2.5-1.5b-instruct-q4_k_m.gguf   https://huggingface.co/bartowski/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf
# llama-server  -m /models/qwen2.5-1.5b-instruct-q4_k_m.gguf --host "0.0.0.0" --port "8080" --ctx-size  "4096" --threads 8 --alias qwen2.5-1.5b
</code></pre>
<p dir="auto"><a href="http://127.0.0.1:8080/health" rel="nofollow ugc">http://127.0.0.1:8080/health</a> =&gt; OK.</p>
]]></description><link>https://lemmy.cyber-neurones.org/post/905</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/905</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Thu, 18 Jun 2026 09:32:55 GMT</pubDate></item><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Thu, 18 Jun 2026 09:28:20 GMT]]></title><description><![CDATA[<p dir="auto">Test du modèle : Qwen3.6-35B-A3B-UD-Q4_K_M.gguf</p>
<pre><code class="language-bash"># llama-server  -m /models/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf 
0.01.777.599 I log_info: verbosity = 3 (adjust with the `-lv N` CLI arg)
0.01.777.603 I device_info:
0.01.868.348 I   - CUDA0   : NVIDIA GeForce RTX 3060 Laptop GPU (5806 MiB, 5674 MiB free)
0.01.868.364 I   - CPU     : 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz (64044 MiB, 64044 MiB free)
0.01.868.506 I system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CUDA : ARCHS = 750,800,860,890,900,1200,1210 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
0.01.868.514 I srv  llama_server: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
0.01.868.566 I srv          init: running without SSL
0.01.868.646 I srv          init: using 15 threads for HTTP server
0.01.869.153 I srv         start: binding port with default address family
0.01.870.425 I srv  llama_server: loading model
0.01.870.434 I srv    load_model: loading model '/moddels/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf'

</code></pre>
]]></description><link>https://lemmy.cyber-neurones.org/post/904</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/904</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Thu, 18 Jun 2026 09:28:20 GMT</pubDate></item><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Thu, 18 Jun 2026 08:55:50 GMT]]></title><description><![CDATA[<p dir="auto">Chargement de modèle sur : <a href="https://huggingface.co/unsloth" rel="nofollow ugc">https://huggingface.co/unsloth</a></p>
<ul>
<li><a href="https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF/blob/main/mmproj-BF16.gguf" rel="nofollow ugc">https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF/blob/main/mmproj-BF16.gguf</a></li>
</ul>
]]></description><link>https://lemmy.cyber-neurones.org/post/903</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/903</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Thu, 18 Jun 2026 08:55:50 GMT</pubDate></item><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Thu, 18 Jun 2026 08:35:09 GMT]]></title><description><![CDATA[<p dir="auto">Donc on est passé de : CPU vs CUDA<br />
pp512 : 303.04 t/s =&gt;  10984.34 t/s ( x36 env. )<br />
tg128 :  46.90 t/s =&gt; 225.56 t/s ( x5 env. )</p>
]]></description><link>https://lemmy.cyber-neurones.org/post/902</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/902</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Thu, 18 Jun 2026 08:35:09 GMT</pubDate></item><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Thu, 18 Jun 2026 08:27:25 GMT]]></title><description><![CDATA[<p dir="auto">Aie …</p>
<pre><code class="language-bash"># nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 560.35
# sudo apt-get purge nvidia-*
#  ubuntu-drivers install
# nvidia-smi
Thu Jun 18 10:26:10 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.71.05              Driver Version: 595.71.05      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   40C    P0            752W /  115W |      15MiB /   6144MiB |      9%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            5003      G   /usr/lib/xorg/Xorg                        4MiB |
+-----------------------------------------------------------------------------------------+
# llama-bench -m ~/.cache/huggingface/hub/models--ggml-org--gemma-3-1b-it-GGUF/snapshots/f9c28bcd85737ffc5aef028638d3341d49869c27/gemma-3-1b-it-Q4_K_M.gguf 
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 5806 MiB):
  Device 0: NVIDIA GeForce RTX 3060 Laptop GPU, compute capability 8.6, VMM: yes, VRAM: 5806 MiB
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | CUDA       |  -1 |           pp512 |    10984.34 ± 550.24 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | CUDA       |  -1 |           tg128 |        225.56 ± 0.57 |

build: b4024af6c (9687)

</code></pre>
]]></description><link>https://lemmy.cyber-neurones.org/post/901</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/901</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Thu, 18 Jun 2026 08:27:25 GMT</pubDate></item><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Thu, 18 Jun 2026 08:16:09 GMT]]></title><description><![CDATA[<p dir="auto">Version :</p>
<pre><code class="language-bash"># nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2026 NVIDIA Corporation
Built on Fri_Apr_24_07:22:02_PM_PDT_2026
Cuda compilation tools, release 13.3, V13.3.33
Build cuda_13.3.r13.3/compiler.37862127_0
</code></pre>
]]></description><link>https://lemmy.cyber-neurones.org/post/900</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/900</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Thu, 18 Jun 2026 08:16:09 GMT</pubDate></item><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Thu, 18 Jun 2026 08:14:57 GMT]]></title><description><![CDATA[<p dir="auto">Nouveau test :</p>
<pre><code class="language-bash"># llama-bench -m ~/.cache/huggingface/hub/models--ggml-org--gemma-3-1b-it-GGUF/snapshots/f9c28bcd85737ffc5aef028638d3341d49869c27/gemma-3-1b-it-Q4_K_M.gguf 
ggml_cuda_init: failed to initialize CUDA: CUDA driver version is insufficient for CUDA runtime version
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | CUDA       |  -1 |           pp512 |        303.04 ± 1.63 |
| gemma3 1B Q4_K - Medium        | 762.49 MiB |   999.89 M | CUDA       |  -1 |           tg128 |         46.90 ± 1.21 |

build: b4024af6c (9687)
</code></pre>
]]></description><link>https://lemmy.cyber-neurones.org/post/899</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/899</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Thu, 18 Jun 2026 08:14:57 GMT</pubDate></item><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Thu, 18 Jun 2026 07:46:59 GMT]]></title><description><![CDATA[<p dir="auto">Build llama.cpp for CUDA :</p>
<pre><code class="language-bash"># cd llama.cpp
# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
# export PATH=$PATH:$CUDA_HOME/bin
# cmake -B build -DGGML_CUDA=ON  -DCMAKE_CUDA_COMPILER=`which nvcc`
CMAKE_BUILD_TYPE=Release
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native 
-- CUDA Toolkit found
-- The CUDA compiler identification is NVIDIA 13.3.33
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CMAKE_CUDA_ARCHITECTURES=75-virtual;80-virtual;86-real;89-real;90-virtual;120a-real;121a-real CMAKE_CUDA_ARCHITECTURES_NATIVE=
-- Could NOT find NCCL (missing: NCCL_LIBRARY NCCL_INCLUDE_DIR) 
-- Warning: NCCL not found, performance for multiple CUDA GPUs will be suboptimal
-- CUDA host compiler is GNU 11.4.0
-- Including CUDA backend
-- ggml version: 0.15.1
-- ggml commit:  b4024af6c
-- OpenSSL found: 3.0.2
-- Generating embedded license file for target: llama-app
-- Configuring done
-- Generating done
-- Build files have been written to: /home/arias/GIT/llama.cpp/build
# cmake --build build --config Release -j 20
</code></pre>
]]></description><link>https://lemmy.cyber-neurones.org/post/898</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/898</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Thu, 18 Jun 2026 07:46:59 GMT</pubDate></item><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Thu, 18 Jun 2026 07:39:45 GMT]]></title><description><![CDATA[<p dir="auto">Mise à jours de CUDA : 12.6 =&gt; 13.2<br />
Mise à jours de NVIDIA : 560.35.03  ( 21 aout 2024 : <a href="https://www.nvidia.com/en-us/drivers/details/230918/" rel="nofollow ugc">https://www.nvidia.com/en-us/drivers/details/230918/</a> ) =&gt; 595.71.05</p>
<pre><code class="language-bash">(base) root@tuxedo17:/home/arias# nvidia-smi 
Thu Jun 18 09:37:40 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.71.05              Driver Version: 595.71.05      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   42C    P0            752W /  115W |      15MiB /   6144MiB |      9%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
</code></pre>
]]></description><link>https://lemmy.cyber-neurones.org/post/897</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/897</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Thu, 18 Jun 2026 07:39:45 GMT</pubDate></item><item><title><![CDATA[Reply to Benchmark llama.cpp sur Tuxedo 17 on Wed, 17 Jun 2026 15:26:20 GMT]]></title><description><![CDATA[<p dir="auto">Ma version de CUDA : 12.6</p>
<pre><code class="language-bash">$ nvidia-smi 
Wed Jun 17 17:13:28 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   49C    P8             12W /  115W |     803MiB /   6144MiB |     29%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

</code></pre>
<p dir="auto">Mise à jours de CUDA :</p>
<pre><code class="language-bash">wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-3
</code></pre>
]]></description><link>https://lemmy.cyber-neurones.org/post/896</link><guid isPermaLink="true">https://lemmy.cyber-neurones.org/post/896</guid><dc:creator><![CDATA[Tuxedo17]]></dc:creator><pubDate>Wed, 17 Jun 2026 15:26:20 GMT</pubDate></item></channel></rss>