llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

`build`: generate hex dump of server assets during build

Open ochafik opened this issue 10 months ago • 11 comments

Currently one needs to manually regenerate asset files in examples/server, which:

  • is easy to forget
  • ~~may cause a security risk~~ (reviewers are unlikely to unescape the code - don't wanna be the next target after xz) (edit: already mitigated by https://github.com/ggerganov/llama.cpp/pull/6409 as pointed out below)

od is used in Makefile (unlike xxd, it should be available on ~all systems, incl. on Windows w/ w64devkit - thanks to busybox), while cmake and zig builds cast their own hex for portability (may have made me n-curse).

ochafik avatar Apr 13 '24 15:04 ochafik

It has been addressed the xz weekend here:

  • #6409

But I am happy that finally the code is removed, so looks good

phymbert avatar Apr 13 '24 15:04 phymbert

Please pay attention that the idea may look simple on the surface, but it's get messy when taking into account that we're not compiling for single target.

  • Some linux distro (docker/CI) may not have xxd by default. You may need to add something like apt install xxd to every dockerfile and compile scripts in the project. This will also create a new dependency in compilation phase. In fact, a while ago, I success in making a script that does the same thing with xxd but using od instead. The reason was because od is included by default in most linux distros. See this article for more details.
  • Things get messy when talking about Windows and MacOS. In fact, I have 0 knowledge how to compile on these targets, so I gave up.

Edit: sorry I deleted the od script that I mentioned above..

ngxson avatar Apr 13 '24 16:04 ngxson

@phymbert great to read you've already mitigated the issue, I'd missed that commit in the flow.

Things get messy when talking about Windows and MacOS. In fact, I have 0 knowledge how to compile on these targets, so I gave up.

@ngxson actually I've just hit a snag by using -n that my BSD xxd likes (on MacOS), and GNU xxd has no clue about (on CI) - fixed.

Re/ Windows, I realize I've assumed people build from within WSL (or cross-build from Linux), but would need confirmation.

In the absolute worst case we could write our own minimalistic xxd, I suppose.

ochafik avatar Apr 13 '24 16:04 ochafik

Re/ Windows, I realize I've assumed people build from within WSL (or cross-build from Linux), but would need confirmation.

We cannot assume unfortunately, I have the feeling most windows developper are using Visual Studio.

In the absolute worst case we could write our own minimalistic xxd, I suppose.

I would prefer to have a classical npm build then. Also what about:

  • https://stackoverflow.com/questions/4158900/embedding-resources-in-executable-using-gcc

phymbert avatar Apr 13 '24 16:04 phymbert

I would prefer to have a classical npm build then.

@phymbert possibly inevitable in the long run, would unlock new horizons...

Also what about: https://stackoverflow.com/questions/11813271/embed-resources-eg-shader-code-images-into-executable-library-with-cmake

Hah, nifty! Might need some contortions to get something similar to work w/ MSVC, but alternatively... looks like we could do it in cmake: https://stackoverflow.com/questions/11813271/embed-resources-eg-shader-code-images-into-executable-library-with-cmake

Which means out of the 3 documented ways to build on Windows:

  • (MS) make + w64devkit: the latter ships w/ busybox, which seems to have xxd
  • cmake: can do the HEX in CMakeLists.txt
  • zig: same (TBC, I'm not sure how much I like the language so far :-D)

(so xxd in Makefile for all platforms, and adhoc hexing in the other build scripts 🤞)

ochafik avatar Apr 13 '24 16:04 ochafik

Good job! The read file as HEX trick in cmake seem to resolve the problem on both mac & windows.

For make, there're users still using make on linux, and xxd may not be available by default, at least it's not installed on my fedora installation and my ubuntu distrobox. Of course I can simply apt install, but I'm a bit doubt that this change may break compilation for someone else.

ngxson avatar Apr 14 '24 00:04 ngxson

In the absolute worst case we could write our own minimalistic xxd, I suppose.

This does not sound really awful - we can give it a try if we encounter issues with the proposed PR

One idea to make things even more user friendly is to have a minimal index.html committed and loaded when the hex conversion has not been applied (for whatever reason). That minimal index.html can just show information about the necessary steps to generate the UI

ggerganov avatar Apr 15 '24 10:04 ggerganov

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 460 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=10214.07ms p(95)=26164.51ms fails=, finish reason: stop=412 truncated=48
  • Prompt processing (pp): avg=112.42tk/s p(95)=549.33tk/s
  • Token generation (tg): avg=24.22tk/s p(95)=36.95tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=generate-assets commit=b9286a4d7b929b077f1d05395e2bc9ecadf5a99a

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 460 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1713214386 --> 1713215020
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 488.34, 488.34, 488.34, 488.34, 488.34, 751.63, 751.63, 751.63, 751.63, 751.63, 475.07, 475.07, 475.07, 475.07, 475.07, 505.87, 505.87, 505.87, 505.87, 505.87, 536.01, 536.01, 536.01, 536.01, 536.01, 581.01, 581.01, 581.01, 581.01, 581.01, 583.14, 583.14, 583.14, 583.14, 583.14, 584.69, 584.69, 584.69, 584.69, 584.69, 595.22, 595.22, 595.22, 595.22, 595.22, 599.72, 599.72, 599.72, 599.72, 599.72, 601.64, 601.64, 601.64, 601.64, 601.64, 610.24, 610.24, 610.24, 610.24, 610.24, 612.16, 612.16, 612.16, 612.16, 612.16, 633.3, 633.3, 633.3, 633.3, 633.3, 623.27, 623.27, 623.27, 623.27, 623.27, 631.07, 631.07, 631.07, 631.07, 631.07, 636.81, 636.81, 636.81, 636.81, 636.81, 554.47, 554.47, 554.47, 554.47, 554.47, 558.73, 558.73, 558.73, 558.73, 558.73, 559.12, 559.12, 559.12, 559.12, 559.12, 566.66, 566.66, 566.66, 566.66, 566.66, 578.9, 578.9, 578.9, 578.9, 578.9, 583.16, 583.16, 583.16, 583.16, 583.16, 584.08, 584.08, 584.08, 584.08, 584.08, 590.17, 590.17, 590.17, 590.17, 590.17, 591.54, 591.54, 591.54, 591.54, 591.54, 595.02, 595.02, 595.02, 595.02, 595.02, 608.11, 608.11, 608.11, 608.11, 608.11, 605.23, 605.23, 605.23, 605.23, 605.23, 610.26, 610.26, 610.26, 610.26, 610.26, 612.08, 612.08, 612.08, 612.08, 612.08, 621.9, 621.9, 621.9, 621.9, 621.9, 619.29, 619.29, 619.29, 619.29, 619.29, 620.5, 620.5, 620.5, 620.5, 620.5, 621.21, 621.21, 621.21, 621.21, 621.21, 624.94, 624.94, 624.94, 624.94, 624.94, 629.13, 629.13, 629.13, 629.13, 629.13, 628.32, 628.32, 628.32, 628.32, 628.32, 630.71, 630.71, 630.71, 630.71, 630.71, 637.29, 637.29, 637.29, 637.29, 637.29, 643.9, 643.9, 643.9, 643.9, 643.9, 647.81, 647.81, 647.81, 647.81, 647.81, 654.42, 654.42, 654.42, 654.42, 654.42, 654.71, 654.71, 654.71, 654.71, 654.71, 654.36, 654.36, 654.36, 654.36, 654.36, 657.47, 657.47, 657.47, 657.47, 657.47, 661.36, 661.36, 661.36, 661.36, 661.36, 669.77, 669.77, 669.77, 669.77, 669.77, 664.06, 664.06, 664.06, 664.06, 664.06, 660.33, 660.33, 660.33, 660.33, 660.33, 658.81, 658.81, 658.81, 658.81, 658.81, 658.35, 658.35, 658.35, 658.35, 658.35, 657.23, 657.23, 657.23, 657.23, 657.23, 655.95, 655.95, 655.95, 655.95, 655.95, 654.2, 654.2, 654.2, 654.2, 654.2, 657.69, 657.69, 657.69, 657.69, 657.69, 660.03, 660.03, 660.03, 660.03, 660.03, 659.97, 659.97, 659.97, 659.97, 659.97, 661.12, 661.12, 661.12, 661.12, 661.12, 663.23, 663.23, 663.23, 663.23, 663.23, 667.18, 667.18, 667.18, 667.18, 667.18, 667.18, 667.18, 667.18]
                    
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 460 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1713214386 --> 1713215020
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 35.6, 35.6, 35.6, 35.6, 35.6, 35.6, 35.6, 35.6, 35.6, 35.6, 22.94, 22.94, 22.94, 22.94, 22.94, 24.33, 24.33, 24.33, 24.33, 24.33, 25.31, 25.31, 25.31, 25.31, 25.31, 25.37, 25.37, 25.37, 25.37, 25.37, 25.46, 25.46, 25.46, 25.46, 25.46, 26.09, 26.09, 26.09, 26.09, 26.09, 26.26, 26.26, 26.26, 26.26, 26.26, 26.15, 26.15, 26.15, 26.15, 26.15, 26.09, 26.09, 26.09, 26.09, 26.09, 25.26, 25.26, 25.26, 25.26, 25.26, 25.14, 25.14, 25.14, 25.14, 25.14, 25.23, 25.23, 25.23, 25.23, 25.23, 24.45, 24.45, 24.45, 24.45, 24.45, 24.45, 24.45, 24.45, 24.45, 24.45, 23.95, 23.95, 23.95, 23.95, 23.95, 23.74, 23.74, 23.74, 23.74, 23.74, 23.74, 23.74, 23.74, 23.74, 23.74, 23.81, 23.81, 23.81, 23.81, 23.81, 23.74, 23.74, 23.74, 23.74, 23.74, 23.46, 23.46, 23.46, 23.46, 23.46, 23.22, 23.22, 23.22, 23.22, 23.22, 23.2, 23.2, 23.2, 23.2, 23.2, 22.99, 22.99, 22.99, 22.99, 22.99, 23.03, 23.03, 23.03, 23.03, 23.03, 23.18, 23.18, 23.18, 23.18, 23.18, 22.92, 22.92, 22.92, 22.92, 22.92, 23.0, 23.0, 23.0, 23.0, 23.0, 23.07, 23.07, 23.07, 23.07, 23.07, 23.17, 23.17, 23.17, 23.17, 23.17, 22.96, 22.96, 22.96, 22.96, 22.96, 22.74, 22.74, 22.74, 22.74, 22.74, 22.8, 22.8, 22.8, 22.8, 22.8, 23.05, 23.05, 23.05, 23.05, 23.05, 23.15, 23.15, 23.15, 23.15, 23.15, 23.16, 23.16, 23.16, 23.16, 23.16, 23.33, 23.33, 23.33, 23.33, 23.33, 23.42, 23.42, 23.42, 23.42, 23.42, 23.32, 23.32, 23.32, 23.32, 23.32, 23.29, 23.29, 23.29, 23.29, 23.29, 23.17, 23.17, 23.17, 23.17, 23.17, 23.08, 23.08, 23.08, 23.08, 23.08, 23.11, 23.11, 23.11, 23.11, 23.11, 23.24, 23.24, 23.24, 23.24, 23.24, 23.39, 23.39, 23.39, 23.39, 23.39, 23.43, 23.43, 23.43, 23.43, 23.43, 23.36, 23.36, 23.36, 23.36, 23.36, 23.12, 23.12, 23.12, 23.12, 23.12, 22.87, 22.87, 22.87, 22.87, 22.87, 22.7, 22.7, 22.7, 22.7, 22.7, 22.61, 22.61, 22.61, 22.61, 22.61, 22.17, 22.17, 22.17, 22.17, 22.17, 21.16, 21.16, 21.16, 21.16, 21.16, 21.14, 21.14, 21.14, 21.14, 21.14, 21.14, 21.14, 21.14, 21.14, 21.14, 21.17, 21.17, 21.17, 21.17, 21.17, 21.29, 21.29, 21.29, 21.29, 21.29, 21.28, 21.28, 21.28, 21.28, 21.28, 21.46, 21.46, 21.46, 21.46, 21.46, 21.53, 21.53, 21.53]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 460 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1713214386 --> 1713215020
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.04, 0.04, 0.04, 0.04, 0.04, 0.3, 0.3, 0.3, 0.3, 0.3, 0.29, 0.29, 0.29, 0.29, 0.29, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.1, 0.1, 0.1, 0.1, 0.1, 0.14, 0.14, 0.14, 0.14, 0.14, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.27, 0.27, 0.27, 0.27, 0.27, 0.14, 0.14, 0.14, 0.14, 0.14, 0.26, 0.26, 0.26, 0.26, 0.26, 0.25, 0.25, 0.25, 0.25, 0.25, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.23, 0.23, 0.23, 0.23, 0.23, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.35, 0.35, 0.35, 0.35, 0.35, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.29, 0.29, 0.29, 0.29, 0.29, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.1, 0.1, 0.1, 0.1, 0.1, 0.18, 0.18, 0.18, 0.18, 0.18, 0.24, 0.24, 0.24, 0.24, 0.24, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.08, 0.08, 0.08, 0.08, 0.08, 0.23, 0.23, 0.23, 0.23, 0.23, 0.44, 0.44, 0.44, 0.44, 0.44, 0.6, 0.6, 0.6, 0.6, 0.6, 0.58, 0.58, 0.58, 0.58, 0.58, 0.52, 0.52, 0.52, 0.52, 0.52, 0.55, 0.55, 0.55, 0.55, 0.55, 0.45, 0.45, 0.45, 0.45, 0.45, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.06, 0.06, 0.06, 0.06, 0.06, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 460 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1713214386 --> 1713215020
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.0, 1.0, 1.0]
                    

github-actions[bot] avatar Apr 15 '24 18:04 github-actions[bot]

Nomic's Kompute fork builds xxd from source (it's a single .c source file) unconditionally: https://github.com/nomic-ai/kompute/blob/d1e3b0953cf66acc94b2e29693e221427b2c1f3f/CMakeLists.txt#L187

cebtenzzre avatar Apr 15 '24 18:04 cebtenzzre

Nomic's Kompute fork builds xxd from source (it's a single .c source file) unconditionally: https://github.com/nomic-ai/kompute/blob/d1e3b0953cf66acc94b2e29693e221427b2c1f3f/CMakeLists.txt#L187

@cebtenzzre hah great, so we kinda always have xxd after all... (assuming one does git clone --recursive, which I didn't)

In the absolute worst case we could write our own minimalistic xxd, I suppose.

This does not sound really awful - we can give it a try if we encounter issues with the proposed PR

@ggerganov @ngxson I've switched the Makefile to using od & wc (should work on Windows w/ w64devkit through busybox 🤞). None of the 3 hex-dumping methods are exactly pretty, but they're relatively short (building xxd would probably take the same amount of build script lines, or more b/c of two-phased builds).

ochafik avatar Apr 15 '24 20:04 ochafik

Feel free to merge when ready

ggerganov avatar Apr 21 '24 12:04 ggerganov

@ochafik I think this kind of break rocm for cmake in windows.

debugbuild.txt

sorasoras avatar Apr 22 '24 11:04 sorasoras

I think this kind of break rocm for cmake in windows. debugbuild.txt

@sorasoras glancing through it, think it might related to https://github.com/ggerganov/llama.cpp/pull/6797 @jart FYI

ochafik avatar Apr 22 '24 15:04 ochafik