llama.cpp
llama.cpp copied to clipboard
We could use std::unordered_map over std::map
If it is not necessary sorted maps, change std::map to std::unordered_map
std::unordered_map is a hash table so it should be faster than std::map when storing many items.
std::map<id, token> can be a std::vector
the token vector should prob be a struct now which also includes the score (see https://github.com/ggerganov/llama.cpp/commit/074bea2eb1f1349a0118239c4152914aecaa1be4)
the token vector should prob be a struct now which also includes the score (see 074bea2)
I did a commit merging with the new changes using struct, I am not sure about the names or the organization of the struct
Are you able to measure the performance gains from these changes? Interested to see how much of an impact they have.
Great work!
Are you able to measure the performance gains from these changes? Interested to see how much of an impact they have.
Great work!
Thanks!
Sadly it's hard to do consistent tests, but what I got is: last commit (bd4b46d) result with: ./build/llama -m models/7B/ggml-model-f16.bin --color -p 'tdd is' --seed 0
main: mem per token = 14696644 bytes
main: load time = 3236.45 ms
main: sample time = 57.25 ms
main: predict time = 37547.44 ms / 286.62 ms per token
main: total time = 41166.49 ms
last commit (bd4b46d) + this pull request: (same command above)
main: mem per token = 14696644 bytes
main: load time = 3204.60 ms
main: sample time = 57.64 ms
main: predict time = 37372.86 ms / 285.29 ms per token
main: total time = 40956.10 ms
This differences should be more noticeable with larger datasets and more tokens, I am using the model 7B.
Are you able to measure the performance gains from these changes? Interested to see how much of an impact they have.
there should not really be any. none of the code is particularly hot.
Apologies for the conflicts - lets resolve and merge.
Regarding the C++ standard question from the other thread:
There are a few reasons and I know there will be people that will disagree and I totally understand. I just think this way it is less incentivising to use overly-complicated constructs. You are correct that in this case we can actually gain some performance with std::string_view
and even save unnecessary allocations, but this part of the code is totally negligible compared to the full transformer evaluation. Plus, we can do the suggested change using raw pointers - yes, bit more ugly, but the performance will be there.
Overall, my experience tells me this is the better way - or at least it is better in my views and understandings. And if there ever appears a very good reason to bump the standard - we will do it. But at the moment, there is no good enough reason to do it.
Apologies for the conflicts - lets resolve and merge.
Regarding the C++ standard question from the other thread: There are a few reasons and I know there will be people that will disagree and I totally understand. I just think this way it is less incentivising to use overly-complicated constructs. You are correct that in this case we can actually gain some performance with
std::string_view
and even save unnecessary allocations, but this part of the code is totally negligible compared to the full transformer evaluation. Plus, we can do the suggested change using raw pointers - yes, bit more ugly, but the performance will be there.Overall, my experience tells me this is the better way - or at least it is better in my views and understandings. And if there ever appears a very good reason to bump the standard - we will do it. But at the moment, there is no good enough reason to do it.
Thanks!
I think I resolved the conflicts, if there are some problems I am happy to fix.