Nguyễn Văn Tuệ
Nguyễn Văn Tuệ
@csukuangfj Thank you for supper quick reply. I used the LG gram with L build from subword and G is n-gram model. I follow with 3 step: 1. Get N-gram...
I can't reproduce of the results in your paper with yoochoose. Can you give the exactly cmd, dataset after preprocessed. Thanks @SpaceLearner
> i got it working , atleast for my [build](https://github.com/Zarrac/my-pytorch-builds/releases/tag/flash-attn-2.7.4.post1-cuda12.8) Thanks, It help me a tons of hours.