Jiang-Stan

Results 13 issues of Jiang-Stan

hi, I'm trying to run pointpillar on tensorRT, and I am confused about its performance. ![2022-06-20 13-25-43 的屏幕截图](https://user-images.githubusercontent.com/74638604/174530970-cfe123b9-5644-4e38-b925-690bf33bd614.png) the structure of my model are the same as this onnx graph,...

Test environment: - Model: ResNet18 - Task: ImageNet - Eval data num: 5k - Calib data num: 256 - Backend: TensorRT (symmetric feature quantization) |Observer|Top1| Calibration time cost| |-----|-----|-----| |Float|69.86%|-|...

DETR model from: https://github.com/facebookresearch/detr |bit|weight observer| feature observer|mAP|AP50|AP75| remarks| |-----|-----|-----|-----|-----|-----|-----| |float|-|-|0.421 | 0.623 | 0.443 | baseline |8w8f|minmax|minmax|0.280|0.519|0.260| |8w8f|minmax|minmax|0.355|0.574|0.363|aciq laplace observer for last 2 bbox embed layer weights| |8w8f|minmax|minmax|0.358|0.576|0.365|float weight|...

https://github.com/RVC-Boss/GPT-SoVITS/blob/93dd8334f4ce7fb5ccdeabebe05deb26a3cf30fb/GPT_SoVITS/module/models.py#L967 上面这段代码,我理解应该是code len推spec len吧。我看在训练中如果semantic_hz=25的话就是二倍关系,但semantic_hz=50的话似乎应该是等长的? 目前不影响infer测试,应该是由于infer的semantic_hz=25吧,但感觉是个潜在的坑

todolist

我看Mega-TTS2中提的MRTE是文本作Q,音频作KV,但在SoVITS的实现中是音频作Q,文本作KV,然后结果再加上音频的Embedding和global embedding,请问这里是有做过对比试验效果更好吗?

In follow-up

首尾层8w8f weight 不同Observer实验: |Model|config|float|MinMax|MSE|Percentile w/ alpha=1e-3|ACIQ| |-----|-----|-----|-----|-----|-----|-----| |ResNet18|4w8f weight per-channel-symmetric|69.76%|56.91%|57.59%|58.31%|52.95%| |ResNet18|4w8f weight per-group-symmetric group_size=32|69.76%|59.64%|62.08%|59.67%|52.23%| |ResNet18|4w8f weight per-group-symmetric group_size=8|69.76%|66.57%|65.99%|66.57%|50.29%| feature不同Observer实验: |Model|config|float|MinMax|MSE|Percentile w/ alpha=1e-3|ACIQ| |-----|-----|-----|-----|-----|-----|-----| |ResNet18|8w4f feature per-tensor-affine|69.76%|57.51%|67.90%|67.45%|67.71%| |ResNet18|8w4f feature per-group-affine group_size=32|69.76%|60.18%|67.94%|67.49%|67.15%|...

修改后4bit LoRA finetune w/ lr 3e-4表现与alpaca-lora原repo 8bit表现一致。见下图: ![2023-06-13 10-49-38 的屏幕截图](https://github.com/megvii-research/Sparsebit/assets/74638604/50e7ffcc-8ec9-4ea8-a86f-63ed07997e2a)

Model|1epoch PPL|3epoch PPL| |---|---|---| |LLaMA-7b|2.397|2.345| |LLaMA-65b|2.304| LLaMA-7b 4bit QLoRA(lr=3e-4) finetune loss曲线: ![2023-05-12 14-27-44 的屏幕截图](https://github.com/megvii-research/Sparsebit/assets/74638604/cce40660-5cc8-47d4-90d1-09cff63796d2) LLaMA-65b 4bit QLoRA(lr=1e-4) finetune loss曲线(目前仅1 epoch): ![2023-05-15 10-38-15 的屏幕截图](https://github.com/megvii-research/Sparsebit/assets/74638604/9be98c35-d237-441d-a7df-3f9062d140f4)

- W: per-channel-symmetric with minmax observer - A: per-tensor-symmetric with minmax observer - Layers not searched are set to 8w8f - eval data num: 5000 - mix-precision metric: greedy by...