llama.cpp We could use std::unordered

If it is not necessary sorted maps, change std::map to std::unordered_map

std::unordered_map is a hash table so it should be faster than std::map when storing many items.

std::map<id, token> can be a std::vector since the vector index can be equal to the token id

Mar 19 '23 21:03 Fabio3rs

the token vector should prob be a struct now which also includes the score (see https://github.com/ggerganov/llama.cpp/commit/074bea2eb1f1349a0118239c4152914aecaa1be4)

Mar 20 '23 10:03 eiz

the token vector should prob be a struct now which also includes the score (see 074bea2)

I did a commit merging with the new changes using struct, I am not sure about the names or the organization of the struct

Mar 20 '23 12:03 Fabio3rs

Are you able to measure the performance gains from these changes? Interested to see how much of an impact they have.

Great work!

Mar 20 '23 22:03 z11h

Are you able to measure the performance gains from these changes? Interested to see how much of an impact they have.

Great work!

Thanks!

Sadly it's hard to do consistent tests, but what I got is: last commit (bd4b46d) result with: ./build/llama -m models/7B/ggml-model-f16.bin --color -p 'tdd is' --seed 0

main: mem per token = 14696644 bytes
main:     load time =  3236.45 ms
main:   sample time =    57.25 ms
main:  predict time = 37547.44 ms / 286.62 ms per token
main:    total time = 41166.49 ms

last commit (bd4b46d) + this pull request: (same command above)

main: mem per token = 14696644 bytes
main:     load time =  3204.60 ms
main:   sample time =    57.64 ms
main:  predict time = 37372.86 ms / 285.29 ms per token
main:    total time = 40956.10 ms

This differences should be more noticeable with larger datasets and more tokens, I am using the model 7B.

Mar 21 '23 00:03 Fabio3rs

Are you able to measure the performance gains from these changes? Interested to see how much of an impact they have.

there should not really be any. none of the code is particularly hot.

Mar 21 '23 00:03 Green-Sky

Apologies for the conflicts - lets resolve and merge.

Regarding the C++ standard question from the other thread: There are a few reasons and I know there will be people that will disagree and I totally understand. I just think this way it is less incentivising to use overly-complicated constructs. You are correct that in this case we can actually gain some performance with std::string_view and even save unnecessary allocations, but this part of the code is totally negligible compared to the full transformer evaluation. Plus, we can do the suggested change using raw pointers - yes, bit more ugly, but the performance will be there.

Overall, my experience tells me this is the better way - or at least it is better in my views and understandings. And if there ever appears a very good reason to bump the standard - we will do it. But at the moment, there is no good enough reason to do it.

Mar 21 '23 17:03 ggerganov

Apologies for the conflicts - lets resolve and merge.

Regarding the C++ standard question from the other thread: There are a few reasons and I know there will be people that will disagree and I totally understand. I just think this way it is less incentivising to use overly-complicated constructs. You are correct that in this case we can actually gain some performance with std::string_view and even save unnecessary allocations, but this part of the code is totally negligible compared to the full transformer evaluation. Plus, we can do the suggested change using raw pointers - yes, bit more ugly, but the performance will be there.

Overall, my experience tells me this is the better way - or at least it is better in my views and understandings. And if there ever appears a very good reason to bump the standard - we will do it. But at the moment, there is no good enough reason to do it.

Thanks!

I think I resolved the conflicts, if there are some problems I am happy to fix.

Mar 21 '23 17:03 Fabio3rs

llama.cpp
llama.cpp copied to clipboard

We could use std::unordered_map over std::map

llama.cpp llama.cpp copied to clipboard

We could use std::unordered_map over std::map

llama.cpp
llama.cpp copied to clipboard