luhairong11 issues

Results 10 issues of


                                            luhairong11

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

File "/usr/local/lib/python3.7/site-packages/albumentations-0.4.1-py3.7.egg/albumentations/augmentations/bbox_utils.py", line 28, in ensure_data_valid if data.get(data_name) and len(data[data_name][0]) < 5: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()...

模型解密问题

有没有出现解密出来的caffemodel 在caffe中解析不了，报错

训练的视频背景会抖动

你好，想请教一下，为何我训练的视频背景会抖动很厉害训练命令：python main.py data/May/ --workspace trial_May/ -O --iters 200000 以下是训练参数： ![image](https://github.com/ashawkey/RAD-NeRF/assets/11584869/91033dcc-dcd9-4103-ad12-8b30f91dde9c) ![image](https://github.com/ashawkey/RAD-NeRF/assets/11584869/66d98cdb-97f1-4c54-8474-b970f150ec94) ![image](https://github.com/ashawkey/RAD-NeRF/assets/11584869/baf85e43-e730-498f-96a6-89a55fa23430) 下面是模型训练完的测试结果 https://github.com/ashawkey/RAD-NeRF/assets/11584869/fc3fb595-3c93-4bee-be6b-41b0721bff0f

流式输出没有token的统计吗

流式输出没有token的统计吗，看代码中是支持的，为何在输出却没有 ![image](https://github.com/xorbitsai/inference/assets/11584869/fb2624d4-8b9d-450a-84bf-808499a46cbf) ![image](https://github.com/xorbitsai/inference/assets/11584869/c56acede-9f7e-456d-9dfd-ccb0002a09e6)

question

多次请求接口，输出结果会出现异常的情况

![image](https://github.com/THUDM/CodeGeeX2/assets/11584869/31e14a8a-3056-4a2b-b841-f543735c6fab) 采用的模型是codegeex2-6b-int4，在有些请求中，会出现“问”“答”字眼，感觉这个输出有问题，“问”“答”不是在构造prompt的时候的吗，为何在输出的时候有时候也会出现这个情况，频率还挺高的

chatglm-cpp推理加速比transformer推理慢很多

在使用chatglm-cpp推理加速时，比transformer推理慢很多，有人遇到过这个问题吗，采用的模型是codegeex2-6b-int4

一直跑不通你们这个工程

![image](https://github.com/ninehills/llm-inference-benchmark/assets/11584869/0a2fe3fe-4905-4f61-9b9e-8d6ce9f8e142) 执行命令： locust -u 8 -r 2 --prompt-text "我" -o 100 --provider vllm -H http://127.0.0.1:7860 --tokenizer /data/pretrain_models/qwen/Qwen-1_8B-Chat-Int4 --qps 1 有相关的群吗

Why is the inference FTL@1 longer after the vllm framework is quantized?

![image](https://github.com/ninehills/llm-inference-benchmark/assets/11584869/7a3a490c-b8b4-4954-b728-8dffc68f2e19) ![image](https://github.com/ninehills/llm-inference-benchmark/assets/11584869/0d2fcce5-37ff-4c98-907c-35c83a5ad803)

onnx to tensorrt

I have successfully converted to ONNX, but I'm getting an error when converting to TensorRT. What should I do? [TRT] [E] /ecd1/ecd1.0/ecd1.0.0/Conv: two inputs (data and weights) are allowed only...

超过了设置的最大token数，模型还是有返回

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction python src/api.py --model_name_or_path /data/models/LLM_models/qwen/Qwen-72B-Chat-Int4 --template qwen --infer_backend vllm --vllm_gpu_util 0.9 --vllm_maxlen 8000 上述配置设置了最大token为8000，当输入token超过8000的时候，流式调用接口的时候还是会返回2条空内容的json数据，vllm底层会有一个警告，提示超过了最大token。咱们代码里面能不能抛出一个异常错误，这样返回的内容便于直观理解。 ###...

pending