ggml
ggml copied to clipboard
ggml : improve CI + add more tests
The current state of the testing framework is pretty bad - we have a few simple test tools in tests, but these are not maintained properly and are quite rudimentary. Additionally, the Github Actions do not allow to run heavy workloads so it is difficult to run integration tests even on small models such as GPT-2. Not to mention that there is no GPU support
Ideally, it would be awesome to make a CI that can build the code on as much different hardware as possible and perform some performance and accuracy tests for various models. This will allow quicker iteration over new changes to the core library
I posted a discussion in llama.cpp
on this topic - hopefully we gather some insight on how to make such CI in the cloud:
https://github.com/ggerganov/llama.cpp/discussions/1985
Extra related issues:
- https://github.com/ggerganov/llama.cpp/issues/2631
- https://github.com/ggerganov/llama.cpp/issues/2634
TODOs:
- [ ] Add Metal CI to
llama.cpp
using the new macos-13 runners: https://github.com/ggerganov/ggml/pull/514
I'd be interested in helping with the 'add more tests' part of this because of some unanswered question. But I believe it would be reasonable to have some directions here. Obvious question: do we have some means to get test coverage yet?
I guess we can focus on CPU-only testing for now. The most straightforward approach is to have a unit test for each function in the ggml.h
API. Some functions like ggml_rope()
and ggml_alibi()
should be cross-validated with the reference Python implementations somehow since these are difficult to judge if they compute stuff correctly. Such tests are lightweight and can be part of the existing Github Actions.
Regarding GPU tests - when the cloud CI framework is ready, we will simply run "integration" tests in the cloud. For example, the CI can obtain certain model data and run text generation and perplexity calculations using different GPUs. Whatever is available for rent. We can figure out the details for this later.
Test coverage would be nice - I've used lcov
in the past. Maybe we can integrate it in the Github Actions CI.
@ggerganov : My tries to get lcov
working on Windows failed miserably, but I got clang/llvm
coverage analysis working. Here is a first summary:
Filename Regions Missed Regions Cover Functions Missed Functions Executed Lines Missed Lines Cover Branches Missed Branches Cover
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
src\ggml.c 10461 5746 45.07% 512 216 57.81% 9985 4703 52.90% 4848 2775 42.76%
tests\test-grad0.c 464 68 85.34% 11 1 90.91% 818 68 91.69% 322 62 80.75%
Files which contain no functions:
include\ggml\ggml.h 0 0 - 0 0 - 0 0 - 0 0 -
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL 10925 5814 46.78% 523 217 58.51% 10803 4771 55.84% 5170 2837 45.13%
The report is based on the merged profile data of all currently active tests, run by ctest
, llvm
seems to require the mention of a specific executable, however. If this is an acceptable way forward, I'll try to cleanup the CMake
changes and propose a PR.
Yes, this looks even better. Let's give it a try
Hi @ggerganov, I just published a PR to support multiple platforms and OS few days ago. Let me know if it's something you find relevant?
@alonfaraj
Thank you very much! I'm currently looking at the PR - sorry for the delay
In the meantime, I've made progress on the Azure Cloud CI idea and hacked a simple framework using Bash + Git:
https://github.com/ggml-org/ci
Currently, I am able to very easily attach new nodes from the cloud and have them run various tests. The tests are implemented in the ci/run.sh
script. At the moment I've rented just 3 CPU instances:
The ggml-2
instance is a high-performance one and can run heavier workloads like MPT 7B inference.
The results are summarized neatly in Github README.md files for each commit.
If this strategy turns out to be effective, I will probably scale it up and add GPU and bare-metal nodes.
Looks good! I will take a deeper look as well.
Add Metal CI to llama.cpp
using the new macos-13 runners: https://github.com/ggerganov/ggml/pull/514
@ggerganov , How are you going... and how are you progressing on CI?
I've recently finished an Azure Arctitecture/Devops contract... got familiar with CI/CD on Azure, Azure Infrastructure-as-code (IaC), different Azure Services etc.
Re-reading this Roadmap item... seems the solution may be
Github Action starts a CI process on Azure; - create Azure "Webworker" infrastructure - multi approaches, shared, to dedicated, CPU or GPU - [optionally] run unit tests - run performance test - return reports - destroy Azure "Webworker" infrastructure
A yaml settings file + GitHub Secrets to manage the config.
The CI could be run on the forked Repo.. using the GitHub Secrets, hence Azure Credentials, of the fork GitHub Account
Do you still have a large Azure allocation?