VITA icon indicating copy to clipboard operation
VITA copied to clipboard

The issue of getting poor results with many ! in inference.

Open kejun1 opened this issue 1 year ago • 1 comments

In the Quick Start section of Inference, I used the Text Query command, and the model loaded successfully. However, the output results appear to be poor, with many exclamation marks being generated. I would like to know why this is happening.

CUDA_VISIBLE_DEVICES=0 python video_audio_demo.py --model_path VITA-1.5 --image_path asset/vita_demo.jpg --model_type qwen2p5_instruct --conv_mode qwen2p5_instruct --question "Describe this images." /home/yy448/ProgramFiles/anaconda3/envs/vita/lib/python3.10/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning) Please build and install Nvidia apex package with option '--cuda_ext' according to https://github.com/NVIDIA/apex#from-source . Please install mamba_ssm to use MambaSSM component. /home/yy448/ProgramFiles/anaconda3/envs/vita/lib/python3.10/site-packages/torch/_jit_internal.py:726: FutureWarning: ignore(True) has been deprecated. TorchScript will now drop the function call on compilation. Use torch.jit.unused now. {} warnings.warn( the number of whale encoder params: 341.3681640625M Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████| 4/4 [00:05<00:00, 1.30s/it] the number of vision encoder params: 289.9287109375M ☜!The!!image!!!! depicts!!!!! a!!!!!!!!! modern!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Time consume: 23.836864709854126

kejun1 avatar Dec 24 '24 02:12 kejun1

same problem

Hoonly avatar Jan 17 '25 08:01 Hoonly