cccpr
cccpr
@Tracin How to build TensorRT engine,using the files created by ammo-w8-a8-smoothquant? I can not see any docs.
@Tracin for weight-only issue, you mentioned "make the build option aligned", which option are you refering to?
@Tracin Many LLM-quantization papers(for example, [this paper](https://arxiv.org/pdf/2308.15987.pdf) have stated that LLama2-7b-w8a8-smoothquant accuracy **is close to fp16 accuracy on MMLU** (including myself have done some experiments in my own codes, the...
@Tracin You can check the comments in this issue I have already wrote, I have already used --per_channel --per_token
@Tracin thanks for the effort. You mentioned that bad acc on int8-kv is not reproduced. Can you share your tensorrt-llm version and running commands?
@Tracin - My TensorRT-LLM version is 0.7.1, and I followed the modifications you mentioned below, **but still get 37.6 for w8a8 smoothquant acc on mmlu.** So there are some other...
@already-taken-m17 @jcjohnson After training a model, I find the training epoch seems not enough, so I reload it and try to finetune(retrain), but why the training loss seems like the...
@ruotianluo training on my own dataset. The attention looks like this: [image](https://user-images.githubusercontent.com/13804492/31314100-38608414-ac2a-11e7-8cdf-19874b596746.png) any idea why the attention looks so weird?
@ruotianluo the caption results are fine. The dataset is quite small(less than 1000 images), and the vocabulary is less than 100 words
@ruotianluo and after training for more time, the attention becomes like this: [image](https://user-images.githubusercontent.com/13804492/31314429-30ff442c-ac33-11e7-9da1-8613d7c43653.png)