ChatGLM-6B [BUG/Help] 运行web_demo.py时报tensor<1x11x1xf16>' and 'tensor<1xf32>错误

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

错误内容： loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x11x1xf16>' and 'tensor<1xf32>' are not broadcast compatible LLVM ERROR: Failed to infer result type(s). zsh: abort streamlit run web_demo2.py /Users/myuser/opt/anaconda3/envs/chatglm2/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

执行cli_demo.py时正常。

Expected Behavior

No response

Steps To Reproduce

python web_demo.py

streamlit run web_demo2.py

Environment

- OS: Mac 13.2.1
- Python: 3.10.0
- Transformers: 4.26.1
- PyTorch: 2.0.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

Mar 30 '23 02:03 wanggz

是不是你在cli_demo和web_demo里用的精度不一样？目前用half精度在mps下有时候会出问题

Mar 30 '23 02:03 duzx16

cli_demo和web_demo都是下面的配置： model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().to('mps')

Mar 30 '23 03:03 wanggz

是不是你在cli_demo和web_demo里用的精度不一样？目前用half精度在mps下有时候会出问题

我改成float后一直转圈圈，没有输出。

控制台输出了： /Users/user/.cache/huggingface/modules/transformers_modules/local/modeling_chatglm.py:417: UserWarning: MPS: no support for int64 min/max ops, casting it to int32 (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/ReduceOps.mm:1271.) cos, sin = self.rotary_emb(q1, seq_len=position_ids.max() + 1)

Mar 30 '23 03:03 wanggz

float可以在web_demo中输出，但是非常非常的慢。。。。。。

Mar 30 '23 03:03 wanggz

float可以在web_demo中输出，但是非常非常的慢。。。。。。

Mar 30 '23 03:03 wanggz

最后报了 RuntimeError: MPS backend out of memory (MPS allocated: 23.65 GB, other allocations: 12.63 GB, max allowed: 36.27 GB). Tried to allocate 16.00 KB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

float一直无法释放资源且速度相当慢，输出了200字，大约用时3小时。

Mar 30 '23 08:03 wanggz

请问你这个问题解决了吗。我也是同样的情况，我这里都是返回这个错误。我不知道问题出在哪里。

Mar 31 '23 04:03 bigdaxin

请问你这个问题解决了吗。我也是同样的情况，我这里都是返回这个错误。我不知道问题出在哪里。

我没有解决

Mar 31 '23 07:03 wanggz

+1, waiting for solution.

Apr 01 '23 03:04 ronething

use the command like the below work for me.

pip3 install --pre --force-reinstall torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu

Apr 01 '23 04:04 ronething

我也刚遇到，其实文档里面有说要用pytorch nightly版本，不能用默认的2.0.0版本，按照楼上的安装pytorch nightly就好了

Apr 11 '23 08:04 xiaohuanshu

我也刚遇到，其实文档里面有说要用pytorch nightly版本，不能用默认的2.0.0版本，按照楼上的安装pytorch nightly就好了

根据 ronething 的方案去安装 pytorch nightly版本是可以运行，但是目前看起来16G想正常使用还是太难了

Apr 15 '23 07:04 DingBool

我根据 ronething 的方案去安装 pytorch nightly版本还是抱这个错误，是不是因为我的内存是16G？

Apr 18 '23 02:04 nanayashiki1215