VisualGLM-6B
VisualGLM-6B copied to clipboard
话说能支持下苹果的MPS吗,现在mac m2上运行报错
❯ python web_demo.py
[2023-05-21 21:29:01,122] [INFO] DeepSpeed/CUDA is not installed, fallback to Pytorch checkpointing.
[2023-05-21 21:29:01,599] [WARNING] Failed to load cpm_kernels:Unknown platform: darwin
[2023-05-21 21:29:01,601] [INFO] building VisualGLMModel model ...
59203
[2023-05-21 21:29:01,625] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-05-21 21:29:01,627] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
[2023-05-21 21:29:13,787] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7810582016
[2023-05-21 21:29:14,203] [INFO] [RANK 0] Torch not compiled with CUDA enabled
[2023-05-21 21:29:14,203] [INFO] [RANK 0] global rank 0 is loading checkpoint /Users/z/.sat_models/visualglm-6b/1/mp_rank_00_model_states.pt
[2023-05-21 21:29:28,809] [INFO] [RANK 0] > successfully loaded /Users/z/.sat_models/visualglm-6b/1/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "/Users/z/git/VisualGLM-6B/web_demo.py", line 128, in <module>
main(args)
File "/Users/z/git/VisualGLM-6B/web_demo.py", line 81, in main
model, tokenizer = get_infer_setting(gpu_device=0, quant=args.quant)
File "/Users/z/git/VisualGLM-6B/model/infer_util.py", line 27, in get_infer_setting
model = model.cuda()
File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 905, in cuda
return self._apply(lambda t: t.cuda(device))
File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 905, in <lambda>
return self._apply(lambda t: t.cuda(device))
File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().cuda() 改为
model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().to(“mps”)
Traceback (most recent call last):
File "/Users/longkeyy/PycharmProjects/hf_demo/llm.py", line 4, in
看上去你使用了量化,目前只支持cuda上的量化。
能参考 stable-diffusion-webui 在官方代码调整下让在mps上可以跑吗?我不太懂机器学习,不知道怎么改
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/devices.py
if sys.platform == "darwin":
from modules import mac_specific
def has_mps() -> bool:
if sys.platform != "darwin":
return False
else:
return mac_specific.has_mps
def extract_device_id(args, name):
for x in range(len(args)):
if name in args[x]:
return args[x + 1]
return None
def get_cuda_device_string():
from modules import shared
if shared.cmd_opts.device_id is not None:
return f"cuda:{shared.cmd_opts.device_id}"
return "cuda"
def get_optimal_device_name():
if torch.cuda.is_available():
return get_cuda_device_string()
if has_mps():
return "mps"
return "cpu"
def get_optimal_device():
return torch.device(get_optimal_device_name())
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/mac_specific.py
运行python web_demo.py 用cpu跑错误是 "slow_conv2d_cpu" not implemented for 'Half'
用mps跑是
loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)):
error: input types 'tensor<1x257x1xf16>' and 'tensor<1xf32>' are not broadcast compatible
更新torch到2.1后是能用fp16和mps跑的,但是貌似有内存泄露,问一个问题后内存就从18G涨到28G,swap一用就扛不住了。
pip list|grep torch
torch 2.1.0.dev20230606
torchaudio 2.1.0.dev20230606
torchvision 0.16.0.dev20230606