llama2.c
llama2.c copied to clipboard
Is this project still active?
Maintainer @karpathy is inactive since many weeks, he made a single commit few days ago, but it seems more like a "ping signal" than a real commit.
We can understand he is very busy and currently does not have time to dedicate to this project.
There are many patches waiting in queue, including sensible ones like quantization, that would allow to run a 7B model without a powerful PC, or support for tinyllama, another work that goes in the same direction of allow hacking on LLM code for people without beefy hardware (BTW this is the initial goal that started this project).
So my question is, would it make sense for @karpathy, if he foresees that his time will be limited, to pick one or more maintainers that can help him to keep this ball rolling?
Yeah I don't have too much time right now for this repo. Please link to any PRs that you consider no-brainers, happy to take a look. Maybe I should merge the quantization branch.
Yeah I don't have too much time right now for this repo. Please link to any PRs that you consider no-brainers, happy to take a look. Maybe I should merge the quantization branch.
@karpathy thx for answering
The quantization branch is really important to be merged IMHO because many patches need to be refactored above it (it changes the model layout), so until is not merged other work is in a kind of state of flux.
I see. It is currently a whole separate file runq.c. Which I don't love, but also don't really see any real way around. Let me re-load my RAM again with where that PR was again... iirc it was pretty much ready to be merged except it was surprisingly not as fast as I originally expected.
I see. It is currently a whole separate file runq.c. Which I don't love, but also don't really see any real way around. Let me re-load my RAM again with where that PR was again... iirc it was pretty much ready to be merged except it was surprisingly not as fast as I originally expected.
THX for merging.
Regarding speed, maybe it is system dependent, in my case is way faster, like more than 2X faster.
我懂了。目前它是一个完全独立的文件 runq.c。我不喜欢这一点,但也没有真正看到任何真正的解决办法。让我再次重新加载我的 RAM,并再次使用 PR 所在的位置...iirc 它几乎已准备好合并,但令人惊讶的是它没有我最初预期的那么快。
感谢合并。
关于速度,也许它取决于系统,就我而言,速度要快得多,比如快 2 倍以上。
Hello, please ask what kind of model runq.c needs after compilation, I have tried many times to quantize 7B model.
@KangkangStu did you follow instructions here? https://github.com/karpathy/llama2.c#int8-quantization
@KangkangStu您遵循此处的说明了吗?https://github.com/karpathy/llama2.c#int8-quantization Of course, I tried llama2_7b and it succeeded, but when I wanted to quantify stories15M.pt there was nothing I could do