llama.cpp docker: add support for CUDA in docker

Assuming one has the nvidia-container-toolkit installed on Linux, or is using a GPU enabled cloud, cuBLAS should be accessible inside the container.

I'm not very familiar with Github Actions, nor the available execution environments available on GitHub. I wouldn't suggest pre-building these and putting them in the registry, unless there's a CI path for them.

I'm not too sure what the right path for putting these into a registry is. But I did want to contribute so people could try it locally!

May 15 '23 00:05 canardleteer

I should add a note, I haven't been able to get the:

-p "Building a website can be done in 10 simple steps:"

...part of the examples in README.md to work for me in Linux because of string escaping (both the original and these new examples). I have tested it with --random-prompt however and it works fine for me with BLAS = 1 and nvidia-smi showing memory usage by llama.cpp.

I just wanted to keep it in line with the other Docker examples.

May 15 '23 00:05 canardleteer

Fixed a trailing white space error from the Makefile (and installed editorconfig for the future).

May 15 '23 20:05 canardleteer

Could I get another run of CI on this?

May 20 '23 02:05 canardleteer

This looks really useful for integration/deployment of llama.cpp into docker hosted services on cloud!

Is there anything (apart from the vast amount of other activity ;-) ) holding up this being merged now?

Um, on which note, I see that total open pull-requests, not issues(!), has risen from, iirc, 53 to 65 since I last checked in.. which is kinda great but also kinda scary because of implication for level of stress on supervision bandwidth..

Jun 06 '23 15:06 deep-pipeline

The upstream Makefile now has a conflict that I will need to resolve, and will do so when I get a chance.

Jun 06 '23 18:06 canardleteer

I didn't merge this PR because I wanted someone else to check it as well; as I said, I'm not very knowledgeable about Docker.

Jun 06 '23 18:06 JohannesGaessler

I have rebased on the latest changes in master. A second set of eyes on my changes + a pipeline run would alleviate my concerns about those changes :)

Jun 06 '23 18:06 canardleteer

Hi folks, just a very gentle nudge again on this front - last time I checked here the total open pull-requests on this repo were about 65, a month later they are over 80.. as before I'm worried that more project approval/supervision bandwidth needs to be allocated to clear backlog a bit so older 'finished' PRs like this don't age stuck in limbo, suffer new integration issues and need revisiting..

It would be a shame if good work gets buried/lost/stuck with PRs not folded into main (and if sprawling PR numbers mean stale PRs are not being explicitly closed or duplicates being coalesced).

The docker support stuff ought to be relatively independent of other changes - maybe worth a final CI check and folding in this one now - any problems for nvidia-docker use can be sorted via bug-reports by people trying it (who currently don't have the easy option).

Best, M.

Jul 05 '23 22:07 deep-pipeline

llama.cpp llama.cpp copied to clipboard

docker: add support for CUDA in docker

llama.cpp
llama.cpp copied to clipboard