Update LLamaEmbedder, Examples packages, and KernelMemory examples
- Embedding generation: Extension with Batch processing + Normalization (important to have this built-in for KernelMemory).
- Examples had wrong nuget packages, updated to correct ones.
- Updated KernelMemory examples.
I have tested the code and it was fine with me. Please do test it at least once more. Thank you.
Test failed: It runs on my computer, I do not see any problem (nothing important has changed compared to before when all tests were OK). Try to restart the test. Thank you.
I've restarted the tests, but I'm not expecting them to pass right now. We seem to have an issue with one specific test at the moment that consistently fails in CI, but nobody can reproduce a failure locally :(
I'm probably just going to suppress that test if I get some time to work on it this weekend.
I've restarted the tests, but I'm not expecting them to pass right now. We seem to have an issue with one specific test at the moment that consistently fails in CI, but nobody can reproduce a failure locally :(
I'm probably just going to suppress that test if I get some time to work on it this weekend.
OK! Take your time Martin, no problem. Thank you!
I have corrected several problems in the code and I could make all tests pass (in the macOSX tests with skipping only 1).
One of the problems I have corrected is that I have completely rewritten the model download code (project file) because it was downloading the models every time it was compiling the code. I think GitHub did not like this... and me neither.
One BUG in llama.cpp prevented the KernelMemory tests to pass. In the following code, if we do not pass a split mode different from None, then llama.cpp will crash out at this point if there is no GPU in the system. I suspect that on macOSX there is a GPU available on GitHub and that was the reason for why the macOSX tests were passing before also.
if (params.split_mode == LLAMA_SPLIT_MODE_NONE) {
if (params.main_gpu < 0 || params.main_gpu >= (int)model->devices.size()) {
LLAMA_LOG_ERROR("%s: invalid value for main_gpu: %d (available devices: %d)\n", __func__, params.main_gpu, (int)model->devices.size());
llama_model_free(model);
return nullptr;
}
ggml_backend_dev_t main_gpu = model->devices[params.main_gpu];
model->devices.clear();
model->devices.push_back(main_gpu);
}
Could you split the csproj file downloading changes to a separate PR? The SkipUnchangedFiles="true" attribute should prevent duplicate downloads, so we'll want to look into that. Having it all in one PR will just slow down merging the embedder changes.
Could you split the csproj file downloading changes to a separate PR? The
SkipUnchangedFiles="true"attribute should prevent duplicate downloads, so we'll want to look into that. Having it all in one PR will just slow down merging the embedder changes.
That would be a lot of work Martin and I have invested already a lot of time in this (finding the problems and correcting them). It could also cause potential problems because of the split of the commits and then I would have even more work. Please try to check all together, if possible. You can easily test the download solution on your computer and the GitHub tests were successful on all platforms (the new code has found the downloaded models). I have tested the download on my computer with no models and also with existing models and it worked fine with me. Thank you.
We will need to do some cleanup later (some commented out code) because some things are not very clear yet (for example why we do not need Embeddings=true anymore, etc.), but I would prefer to first integrate this PR without further changes, if possible. I have left these in to make sure that we do not forget to do some more research.
Thank you!
That's no problem, if it's not just a matter of cherry picking some commits to a new branch don't worry about it 👍
Sorry for the delay on this, I'd hoped to get the binary update PR in first but that's taking a long time!
Sorry for the delay on this, I'd hoped to get the binary update PR in first but that's taking a long time!
Thank you Martin! Yes, I was expecting that this binary update will not be very simple (some key things changed... and it is an update with important improvements). Thank you so much that you do this!