ApolloRay comments

Results 41 comments of


                                            ApolloRay

Fooocus适配Taiyi-XL模型

At present, judging from the test results, the Chinese version does not match the propaganda effect in the paper, and there is a large difference.

Inference problem about the demo.

I change the conv_mode from llava_v1 to chatml_direct. It works, but I can't get the same result as the official demo. ![截屏2024-05-22 19 36 02](https://github.com/dvlab-research/MGM/assets/36093263/cd76b66c-bb68-457d-9ba5-26d7d6d687c4)

mgm-34b-hd, should have a 'model_type' key in its config.json

I reproduce the same model, but I don't meet the same problem. I guess maybe the transformer version ? I got transformer==4.37.2

mgm-34b-hd, should have a 'model_type' key in its config.json

And i'm not sure why it has a "1" at the end of model path. ![截屏2024-05-22 19 34 30](https://github.com/dvlab-research/MGM/assets/36093263/0b7cc326-8233-46fa-afa8-79e56dbdc429)

Test difference.

![image](https://github.com/zeyofu/BLINK_Benchmark/assets/36093263/35c44e2c-d2fa-4e94-b998-0e0ec423a3b1)

Test difference.

Test in official demo (LLaVa-1.6-34B)

The inference speed of the int8 quantization version of SDXL is much slower than that of fp16

Have you found that the size of onnx model for int8 is much bigger than fp16 ? ![截屏2024-03-19 13 49 45](https://github.com/NVIDIA/TensorRT/assets/36093263/0048cf7c-318d-4f07-92dc-a6b9a2f07da2)

The inference speed of the int8 quantization version of SDXL is much slower than that of fp16

> > Have you found that the size of onnx model for int8 is much bigger than fp16 ? > > ![截屏2024-03-19 13 49 45](https://private-user-images.githubusercontent.com/36093263/313936676-0048cf7c-318d-4f07-92dc-a6b9a2f07da2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTA4Mjc5NzQsIm5iZiI6MTcxMDgyNzY3NCwicGF0aCI6Ii8zNjA5MzI2My8zMTM5MzY2NzYtMDA0OGNmN2MtMzE4ZC00ZjA3LTkyZGMtYTZiOWEyZjA3ZGEyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAzMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMzE5VDA1NTQzNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg2MDk0YjBmN2M0MTM1ZDk0YzBlMGJiZjQwYTU2NDA3ZjE0ZTdhMjM5ZjI0ZWZhZDBiOGMwZjc0NzEyNTVhZmEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.J4f6OTJAG1q1-a8395YWyxKhI1miY6aurxDSES9to8Q) > > yes, The onnx...

The inference speed of the int8 quantization version of SDXL is much slower than that of fp16

> update pytorch 2.0, it works, but the speed improvement is very small. [I] Running StableDiffusionXL pipeline |-----------------|--------------| | Module | Latency | |-----------------|--------------| | CLIP | 2.59 ms |...

The inference speed of the int8 quantization version of SDXL is much slower than that of fp16

> > > update pytorch 2.0, it works, but the speed improvement is very small. [I] Running StableDiffusionXL pipeline |-----------------|--------------| | Module | Latency | |-----------------|--------------| | CLIP | 2.59...