Jiang-Stan issues

Results 13 issues of


                                            Jiang-Stan

tensorRT FPS lower than torch

hi, I'm trying to run pointpillar on tensorRT, and I am confused about its performance. ![2022-06-20 13-25-43 的屏幕截图](https://user-images.githubusercontent.com/74638604/174530970-cfe123b9-5644-4e38-b925-690bf33bd614.png) the structure of my model are the same as this onnx graph,...

Add observers

Test environment: - Model: ResNet18 - Task: ImageNet - Eval data num: 5k - Calib data num: 256 - Backend: TensorRT (symmetric feature quantization) |Observer|Top1| Calibration time cost| |-----|-----|-----| |Float|69.86%|-|...

Add DETR Example

DETR model from: https://github.com/facebookresearch/detr |bit|weight observer| feature observer|mAP|AP50|AP75| remarks| |-----|-----|-----|-----|-----|-----|-----| |float|-|-|0.421 | 0.623 | 0.443 | baseline |8w8f|minmax|minmax|0.280|0.519|0.260| |8w8f|minmax|minmax|0.355|0.574|0.363|aciq laplace observer for last 2 bbox embed layer weights| |8w8f|minmax|minmax|0.358|0.576|0.365|float weight|...

clone效果非常棒，求问下为啥infer里需要spec_len*2

https://github.com/RVC-Boss/GPT-SoVITS/blob/93dd8334f4ce7fb5ccdeabebe05deb26a3cf30fb/GPT_SoVITS/module/models.py#L967 上面这段代码，我理解应该是code len推spec len吧。我看在训练中如果semantic_hz=25的话就是二倍关系，但semantic_hz=50的话似乎应该是等长的？目前不影响infer测试，应该是由于infer的semantic_hz=25吧，但感觉是个潜在的坑

todolist

MRTE的方案问题

我看Mega-TTS2中提的MRTE是文本作Q，音频作KV，但在SoVITS的实现中是音频作Q，文本作KV，然后结果再加上音频的Embedding和global embedding，请问这里是有做过对比试验效果更好吗？

In follow-up

Jst/add groupwise quantization

首尾层8w8f weight 不同Observer实验： |Model|config|float|MinMax|MSE|Percentile w/ alpha=1e-3|ACIQ| |-----|-----|-----|-----|-----|-----|-----| |ResNet18|4w8f weight per-channel-symmetric|69.76%|56.91%|57.59%|58.31%|52.95%| |ResNet18|4w8f weight per-group-symmetric group_size=32|69.76%|59.64%|62.08%|59.67%|52.23%| |ResNet18|4w8f weight per-group-symmetric group_size=8|69.76%|66.57%|65.99%|66.57%|50.29%| feature不同Observer实验： |Model|config|float|MinMax|MSE|Percentile w/ alpha=1e-3|ACIQ| |-----|-----|-----|-----|-----|-----|-----| |ResNet18|8w4f feature per-tensor-affine|69.76%|57.51%|67.90%|67.45%|67.71%| |ResNet18|8w4f feature per-group-affine group_size=32|69.76%|60.18%|67.94%|67.49%|67.15%|...

Support lama65b single a100 finetuning

修改后4bit LoRA finetune w/ lr 3e-4表现与alpaca-lora原repo 8bit表现一致。见下图： ![2023-06-13 10-49-38 的屏幕截图](https://github.com/megvii-research/Sparsebit/assets/74638604/50e7ffcc-8ec9-4ea8-a86f-63ed07997e2a)

Jst/support multi epoch pp qlora finetuning

Model|1epoch PPL|3epoch PPL| |---|---|---| |LLaMA-7b|2.397|2.345| |LLaMA-65b|2.304| LLaMA-7b 4bit QLoRA(lr=3e-4) finetune loss曲线： ![2023-05-12 14-27-44 的屏幕截图](https://github.com/megvii-research/Sparsebit/assets/74638604/cce40660-5cc8-47d4-90d1-09cff63796d2) LLaMA-65b 4bit QLoRA(lr=1e-4) finetune loss曲线(目前仅1 epoch)： ![2023-05-15 10-38-15 的屏幕截图](https://github.com/megvii-research/Sparsebit/assets/74638604/9be98c35-d237-441d-a7df-3f9062d140f4)

add HAWQ

- W: per-channel-symmetric with minmax observer - A: per-tensor-symmetric with minmax observer - Layers not searched are set to 8w8f - eval data num: 5000 - mix-precision metric: greedy by...