hoangmit comments

Results 10 comments of


                                            hoangmit

Slow

My experiment is very simple. I just enumerate through the reads in a bam file; for each read, I loop through each base. If I use the original samtools C...

Support pure pytorch implementation for memory_efficient_attention

I am not familiar with the details. However, lucidrains' repo linked above has both Flash (Tri Dao's) attention[1] and Rabe's attention. (They are probably not super fast.) Yes, the main...

Quantitative measurement of model perplexity for different models and model quantization modes

One bug I found is #173 llama.cpp seems to use a different norm method.

Use RMSNorm

RMS norm does not need to compute the mean of the input elements. The implementation here has "v = x[i00] - mean" ... sum2 += v*v. It looks similar to...

Use RMSNorm

Let's revert the change in "main.cpp" (e.g. 3 instances of "ggml_rms_norm" back to "ggml_norm"), if you think it get obviously worse. We need some quantifiable quality test to catch type...

Use RMSNorm

If we have a python interface (text input -> next word) for this, it would be much easier to perform quality test. Most of the nlp toolkits and datasets are...

Use RMSNorm

They don't specified the exact details in the paper. One of the figures shows "training loss". We can just use the basic perplexity measurement on its training data e.g. how...

Use RMSNorm

We also need to make sure the (non quantized) FP16 gives similar probability distribution to the pytorch reference. That is also easy to check.

Use RMSNorm

RoPE is tricky and easy to get wrong. We need a lot of unit tests for operators. We have reference implementation so generating test data is not too hard.

Is library alive?

If this is really inactive, some one probably should make a fork.