Kermit Griffeth
Kermit Griffeth

i sloved the first problem by pip install flashinfer-python==0.2.2 and --enforce-eager,but performance of version 0.8.3 still not as good as that of version 0.6.3.
Since I only have an RTX 4090 24G device at hand, I have reproduced the previous issue (where version 0.8.5 performs worse than 0.6.3) on this device, and I am...
> Is the command the same in both versions? yes
python3 build_visual_engine.py --model_path tmp/hf_models/${MODEL_NAME} --model_type vila --vila_path ${VILA_PATH} # for VILA /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( [TensorRT-LLM]...
> > You can add a chat_template entry to the LLM's tokenizer_config.json file. The old models don't have that entry and I think the HuggingFace library was appending a default...
> > > > You can add a chat_template entry to the LLM's tokenizer_config.json file. The old models don't have that entry and I think the HuggingFace library was appending...
[inference_model2.tar.gz](https://github.com/user-attachments/files/17499493/inference_model2.tar.gz) 抱歉,上面的问题描述有些问题,CPU占用100%的是上传的这个yolo模型,麻烦帮忙看下算子是不是很多在CPU上跑,MobileNetV1那个CPU 占用在30%-40%的样子
有没有工具可以自己看nb模型的算子是CPU上执行的还是在gpu上,这样针对一些效果好的,算子都在gpu上跑的名模型,开发者可以初步做一个判断,能不能通过opencl 在GPU上跑,且能获得比较好的综合性能
Got bad cuda status: out of memory at line: 27/ai/zhiyi/w/multimodal/openbmb/Nanoflow/pipeline/src/vortexData.cu 4090 24G报同样的错误