HJT9328 issues

Results 6 issues of


                                            HJT9328

有什么缩小模型的方式吗？在使用T5训练的时候，想降低模型的大小

修改了config.json中的torch.dtype float32->float16 不生效，然后尝试在train.py中设置fp16:True 在查看checkpoint的时候发现模型也没有明显的大小变化，有什么好的方案来压缩模型大小吗？

question

wontfix

全参数微调的模型如何infer呢？

运行命令 RAY_memory_monitor_refresh_ms=0 CUDA_VISIBLE_DEVICES=2 swift infer \ --model_type chatglm2-6b \ --model_id_or_path /data/LLM_checkpoint/chatglm2-6b/chatglm2-6b \ --infer_backend vllm --tensor_parallel_size 1 报错，其中model_id_or_path是全参数微调的模型没有经过lora [INFO:swift] Due to `ckpt_dir` being `None`, `load_args_from_ckpt_dir` is set to `False`. Traceback (most...

question

72B的模型首字延时如何减少

部署了qwen1.5-72B的模型，测试流式首字延时大概在1.6s，通过什么参数能够减少首字延时呢，求大神

想问一下在A800上测试的吞吐量，换算到推理速度的话有多少tokens/s？

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/baichuan-inc/baichuan-7B/issues) and [Discussions](https://github.com/baichuan-inc/baichuan-7B/discussions) that this hasn't already been reported. (+1 or comment...

question

运行时显示Unable to alloc cuda device memory, use unify memory instead. This may cause low performance

ubantu下运行chatglm 加载不到显存中

chatglm运行时报错Chat error 'jittor_core.Var' object has no attribute 'tile'

如题，机器环境V100 linux16.04