hoangmit
hoangmit
My experiment is very simple. I just enumerate through the reads in a bam file; for each read, I loop through each base. If I use the original samtools C...
I am not familiar with the details. However, lucidrains' repo linked above has both Flash (Tri Dao's) attention[1] and Rabe's attention. (They are probably not super fast.) Yes, the main...
One bug I found is #173 llama.cpp seems to use a different norm method.
RMS norm does not need to compute the mean of the input elements. The implementation here has "v = x[i00] - mean" ... sum2 += v*v. It looks similar to...
Let's revert the change in "main.cpp" (e.g. 3 instances of "ggml_rms_norm" back to "ggml_norm"), if you think it get obviously worse. We need some quantifiable quality test to catch type...
If we have a python interface (text input -> next word) for this, it would be much easier to perform quality test. Most of the nlp toolkits and datasets are...
They don't specified the exact details in the paper. One of the figures shows "training loss". We can just use the basic perplexity measurement on its training data e.g. how...
We also need to make sure the (non quantized) FP16 gives similar probability distribution to the pytorch reference. That is also easy to check.
RoPE is tricky and easy to get wrong. We need a lot of unit tests for operators. We have reference implementation so generating test data is not too hard.
If this is really inactive, some one probably should make a fork.