Cong Lu
Cong Lu
I'm looking into it!
Apologies, haven't had time to look at this yet. A quick workaround would be to simply skip this single `.npz` file with the issue until we can re-generate it. This...
We already provide an implementation for Llama3.1 via OpenRouter.
Confirmed working using 2.0.1 and CUDA 12.2, I would look at the flash attention repos for advice on this issue!
Please see the README, we don't recommend models weaker than GPT-4 for this codebase.
Yes these are the baseline runs to compare any future ideas or proposals against.
See the FAQ, no because things like runtime comparisons are machine dependent :)
Which model are you using? We do not advise any model weaker than GPT-4.
DeepSeek Coder V2 should be totally fine overall! How does the PDF look like at the end? It mainly looks like it's struggling to manage citations. There are no infinite...
You can also check all our DeepSeek logs in the drive link :)