ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

手动下载的模型放到哪个位置?

Open helloxz opened this issue 1 year ago • 16 comments

看到有这一行说明:

如果你从Hugging Face Hub上下载checkpoint的速度较慢,也可以从这里手动下载。

我手动下载了模型,应该放到本地哪个文件夹呢?

helloxz avatar Mar 15 '23 07:03 helloxz

随便放在哪里都行。。在加载的时候设置好本地路径即可。 例如: mypath = "/home/xxxx/public/chatglm-6b" tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True) model = AutoModel.from_pretrained(mypath, trust_remote_code=True).half().quantize(4).cuda() # 这里进行了int4量化。

yaleimeng avatar Mar 15 '23 11:03 yaleimeng

不得行呢,这样会报错:

>>> mypath="D:/apps/ChatGLM-6B/model"
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Program Files\Python37\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 614, in from_pretrained
    pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
  File "D:\Program Files\Python37\lib\site-packages\transformers\models\auto\configuration_auto.py", line 852, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\configuration_utils.py", line 565, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\configuration_utils.py", line 632, in _get_config_dict
    _commit_hash=commit_hash,
  File "D:\Program Files\Python37\lib\site-packages\transformers\utils\hub.py", line 381, in cached_file
    f"{path_or_repo_id} does not appear to have a file named {full_filename}. Checkout "
OSError: D:/apps/ChatGLM-6B/model does not appear to have a file named config.json. Checkout 'https://huggingface.co/D:/apps/ChatGLM-6B/model/None' for available files.

是我姿势没对吗?

helloxz avatar Mar 15 '23 12:03 helloxz

同样遇到了这个问题

ykallan avatar Mar 15 '23 12:03 ykallan

you should download all files from here, except those you have downloaded from tsinghua cloud

feixyz10 avatar Mar 15 '23 14:03 feixyz10

I have downloaded all files from huggingface. However, when I execute

mypath = "G:/chatGLM-6B/model"
tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True)

it returns:

OSError: [WinError 123] 文件名、目录名或卷标语法不正确。: 'C:\\Users\\<My Username>\\.cache\\huggingface\\modules\\transformers_modules\\G:'

It seems that some errors occur in my cache setting

Ling-YangHui avatar Mar 16 '23 07:03 Ling-YangHui

Path name on Windows should be specially treated (just Google this). Pathlib package or replacing "/" with "\" might be possible to solve your problem.

feixyz10 avatar Mar 16 '23 07:03 feixyz10

看报错应该是路径识别有点异常。程序觉得传递的路径有问题(Windows格式),Linux下是/开头而且不会有冒号。 因为程序基本上都是针对Linux系统写的,而Windows系统路径不一样,默认编码也不是utf-8,会有很多坑。 试试按照报错的路径去放置,实在不行建议双系统或者开个Windows的Linux虚拟机去跑(似乎虚拟机能用GPU了吧)。

yaleimeng avatar Mar 16 '23 08:03 yaleimeng

问题应该已解决,目前我的解决方案是,首先用AutoModel,把模型名称填入进去,然后等待模型下载好后,把模型从~/.cache/xxx 路径下,复制到项目目录,然后修改脚本,这样是可以启动的,直接通过云盘下载的模型,缺少一些文件,不能直接启动 @yaleimeng @feixyz10 @helloxz @Ling-YangHui

ykallan avatar Mar 16 '23 09:03 ykallan

哦。。我们从huggingface下载的。所以没遇到 按理说直接弄个压缩包分流就行了啊。。怎么不同地方的东西还不一样

yaleimeng avatar Mar 16 '23 09:03 yaleimeng

云盘只有大文件,只要去huggingface下把小文件下齐就好了,我亲测可行

Liu-Steve avatar Mar 16 '23 12:03 Liu-Steve

%USERPROFILE%.cache\huggingface\hub\models--THUDM--chatglm-6b\snapshots下有一个或多个名字像git版本号的目录,放最新的那个下面就可以了

marszhao avatar Mar 16 '23 12:03 marszhao

不得行呢,这样会报错:

>>> mypath="D:/apps/ChatGLM-6B/model"
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Program Files\Python37\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 614, in from_pretrained
    pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
  File "D:\Program Files\Python37\lib\site-packages\transformers\models\auto\configuration_auto.py", line 852, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\configuration_utils.py", line 565, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "D:\Program Files\Python37\lib\site-packages\transformers\configuration_utils.py", line 632, in _get_config_dict
    _commit_hash=commit_hash,
  File "D:\Program Files\Python37\lib\site-packages\transformers\utils\hub.py", line 381, in cached_file
    f"{path_or_repo_id} does not appear to have a file named {full_filename}. Checkout "
OSError: D:/apps/ChatGLM-6B/model does not appear to have a file named config.json. Checkout 'https://huggingface.co/D:/apps/ChatGLM-6B/model/None' for available files.

是我姿势没对吗?

问题解决了吗?

nikshe avatar Mar 17 '23 10:03 nikshe

@nikshe 是否文件没有下齐全?除了8个权重文件,hugging face 目录下所有的小文件都要放在模型目录里。看这个报错是模型目录里没有 config.json 文件

jerrylususu avatar Mar 17 '23 16:03 jerrylususu

这8个bin文件要用cat把他们连起来吗,我显示找不到模型文件,但连起来又说读取失败:

root@2227e6c2b8b1:/work/chatglm-6b/ChatGLM-6B# python cli_demo.py Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Traceback (most recent call last): File "cli_demo.py", line 6, in model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().quantize(8).cuda() File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py", line 459, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2164, in from_pretrained raise EnvironmentError( OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory THUDM/chatglm-6b. root@2227e6c2b8b1:/work/chatglm-6b/ChatGLM-6B#

ttsking avatar Mar 18 '23 00:03 ttsking

我的解决了,原来是缺少 pytorch_model.bin.index.json

ttsking avatar Mar 18 '23 00:03 ttsking

我这边必须填写window的绝对路径

nikohpng avatar Mar 18 '23 02:03 nikohpng

我的解决了,原来是缺少 pytorch_model.bin.index.json

能分享一下本地运行模型文件的代码吗

luieswww avatar Mar 31 '23 16:03 luieswww

我的模型文件是齐全的,路径也是正确的,但是也报错。最后我把keras卸载了,就没有问题了。好神奇啊!

DelaiahZ avatar Apr 03 '23 03:04 DelaiahZ

我这边在wsl里面,都需要改成绝对路径,两行都要改。

bash99 avatar Apr 04 '23 08:04 bash99

I have downloaded all files from huggingface. However, when I execute

mypath = "G:/chatGLM-6B/model"
tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True)

it returns:

OSError: [WinError 123] 文件名、目录名或卷标语法不正确。: 'C:\\Users\\<My Username>\\.cache\\huggingface\\modules\\transformers_modules\\G:'

It seems that some errors occur in my cache setting

mypath = 'G:\\chatGLM-6B\\model' 这样试试 双斜杠'\\'试试

tiejiang8 avatar Apr 07 '23 15:04 tiejiang8

以下是我本地运行方法

第一次,联网运行

  • 模型会自动下载到 C:\Users\Administrator\.cache\huggingface\hub\models--THUDM--chatglm-6b-int4

后面就可以离线加载

mypath = r'C:\Users\Administrator\.cache\huggingface\hub\models--THUDM--chatglm-6b-int4\snapshots\9163f7e6d9b2e5b4f66d9be8d0288473a8ccd027'

tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True)
  • 9163f7e6d9b2e5b4f66d9be8d0288473a8ccd027 要看自己的是多少。

mingyue0094 avatar Apr 11 '23 12:04 mingyue0094

我的报错信息: root@DESKTOP-FMBI0K0:/data/ChatGLM-6b# python3 demo.py Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Traceback (most recent call last): File "demo.py", line 5, in tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True) File "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py", line 679, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 1813, in from_pretrained **kwargs, File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/tokenization_chatglm.py", line 205, in init self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/tokenization_chatglm.py", line 55, in init assert vocab_file is not None AssertionError

我的代码: from transformers import AutoModel, AutoTokenizer import gradio as gr import mdtex2html mypath="/data/chatglm-6b-int4" tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True) model = AutoModel.from_pretrained(mypath, trust_remote_code=True).half().cuda() model = model.eval()

我使用的是wsl,下载的模型放在/data/chatglm-6b-int4下

xuji755 avatar Apr 11 '23 13:04 xuji755

This change works in cli_demo.py:

local_path = "/home/somepath/somepath/ChatGLM-6B/huggingface_chatglm-6m"
tokenizer = AutoTokenizer.from_pretrained(local_path, trust_remote_code=True)
model = AutoModel.from_pretrained(local_path, trust_remote_code=True).quantize(4).half().cuda()

Firstly I downloaded these .bin files from THU cloud and other files in chatglm-6b folder from huggingface, and an OSError showed when running cli_demo.py. Then I used "git clone https://huggingface.co/THUDM/chatglm-6b", and it works.

taraliu23 avatar Apr 12 '23 06:04 taraliu23

我的报错信息: root@DESKTOP-FMBI0K0:/data/ChatGLM-6b# python3 demo.py Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Traceback (most recent call last): File "demo.py", line 5, in tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True) File "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py", line 679, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 1813, in from_pretrained **kwargs, File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 1958, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/tokenization_chatglm.py", line 205, in init self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/tokenization_chatglm.py", line 55, in init assert vocab_file is not None AssertionError

我的代码: from transformers import AutoModel, AutoTokenizer import gradio as gr import mdtex2html mypath="/data/chatglm-6b-int4" tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True) model = AutoModel.from_pretrained(mypath, trust_remote_code=True).half().cuda() model = model.eval()

我使用的是wsl,下载的模型放在/data/chatglm-6b-int4下

我也是一样的情况,给了地址,他还是走缓存

xiaoxinxin666666 avatar Apr 12 '23 06:04 xiaoxinxin666666

Please follow the instructions at https://github.com/THUDM/ChatGLM-6B#%E4%BB%8E%E6%9C%AC%E5%9C%B0%E5%8A%A0%E8%BD%BD%E6%A8%A1%E5%9E%8B

duzx16 avatar Apr 12 '23 15:04 duzx16

改为

from transformers import AutoModel, AutoTokenizer
import gradio as gr
import mdtex2html

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("E:/model/LanguageModel/ChatGLM/chatglm-6b", trust_remote_code=True).half().cuda()

后报错

OSError: [WinError 123] 文件名、目录名或卷标语法不正确。: 'C:\\Users\\Administrator\\.cache\\huggingface\\modules\\transformers_modules\\E:'

linonetwo avatar Apr 20 '23 15:04 linonetwo

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("e:\\model\\LanguageModel\\ChatGLM\\chatglm-6b", trust_remote_code=True).half().cuda()

图片

这样可以

linonetwo avatar Apr 20 '23 15:04 linonetwo

在demo.py import之后增加/修改


import os
model_path = os.path.join(".", "models")
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()
model = model.eval()

然后把从hugfacing上git clone好的文件放到models目录下。这个linux和windows都可以用。

LeXwDeX avatar May 10 '23 06:05 LeXwDeX

请问我用的自己训好的模型hf格式,用web_demo跑一直没有反应,没有输出。请问怎么解决呢 2781686214546_ pic

12lxr avatar Jun 08 '23 09:06 12lxr

模型参数没用缓存,但为什么模型本身加载的py还用的缓存啊,咋解决

daihuangyu avatar Jun 15 '23 08:06 daihuangyu