Andrej

Results 373 comments of Andrej

When you're training you don't care about special tokens usually, to create training data you'd directly insert the token ids as integers in between documents instead of changing the text...

This is really awesome but I don't think I can take on its maintanance in this repo. I'm very happy to link to your work from the README file in...

Question: what is the benefit of fp16? - As the Llama 2 models were trained in bf16 I find fp16 conversion sketchy. For newly trained models this is less of...

I like where this is going, but this looks like multiple PRs in one, and a little bit of sus code. I'll inline comment

Yeah I don't have too much time right now for this repo. Please link to any PRs that you consider no-brainers, happy to take a look. Maybe I should merge...

I see. It is currently a whole separate file runq.c. Which I don't love, but also don't really see any real way around. Let me re-load my RAM again with...

@KangkangStu did you follow instructions here? https://github.com/karpathy/llama2.c#int8-quantization

In principle absolutely. In practice noone has submitted a PR to export it and I don't personally care as much, so I've been ignoring it :) Would probably accept the...

yep exactly, the v1+ header is large enough to incorporate additional hyperparameters like this.