PowerInfer How to understand the codes of llama.cpp?

Your PowerInfer is an amazing work to achieve great performance! Inspired by your brilliant ideas, I myself am thinking about development new features based on llama.cpp.

However, it is a bit hard for me to fully understand the structure of llama.cpp. As you guys have the experience of developing PowerInfer, im sincerely asking for your help:

is there any docs or videos suitable for a beginner to understand the whole structure llama.cpp? (even your own understanding would be helpful! )
could you share some tips for development based on llama.cpp?

I would be really grateful if you can give me a helping hand. Thanks in advance!

Jan 24 '24 03:01 BHbean

Thank you for your interest in PowerInfer and we are more than happy to inspire more people!

The code structure of PowerInfer is consistent with that of llama.cpp, including aspects such as organizing the computation graph, external I/O (in llama.cpp), different operator implementations (ggml.c, ggml-cuda.cu, etc.), specific sub-function implementations (ggml-alloc.c), and high-level applications (under examples/). Therefore, I recommend focusing on understanding the architecture of llama.cpp.

Unfortunately, llama.cpp itself doesn't have extensive documentation, let alone textual or video tutorials. If you are keen to learn, you might find this community discussion helpful. This is similar to how we onboard new collaborators in our team, through collaborative learning and discussions.

Jan 27 '24 03:01 hodlen

Sorry for the late thanks! Thanks for your comprehensive explanation! I will check the disscusion to learn some helpful knowledge as well!

Huge thanks again! Hope to keep communication and learn from you guys!

Feb 27 '24 11:02 BHbean