OpenChatKit
OpenChatKit copied to clipboard
Issue Converting Weights to Huggingface Format
I'm trying to convert the weights as per the example but running into an issue.
After mkdir huggingface_models \ && python tools/convert_to_hf_gptneox.py \ --ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5 --save-path /huggingface_models/GPT-NeoXT-Chat-Base-20B --n-stages 8 --n-layer-per-stage 6
I'm getting this error:
Traceback (most recent call last): File "/mnt/c/Users/name/OpenChatKit/tools/convert_to_hf_gptneox.py", line 102, in <module> assert args.save_path is not None AssertionError --save-path: command not found --n-stages: command not found --n-layer-per-stage: command not found
I'm using Windows 11 WSL Ubuntu 22.04.2 LTS
That's a typo in the README. I'll put a fix up in a moment. It should be:
mkdir huggingface_models \
&& python tools/convert_to_hf_gptneox.py \
--ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5 \
--save-path huggingface_models/GPT-NeoXT-Chat-Base-20B \
--n-stages 8 \
--n-layer-per-stage 6
Note all the backslashes.
Let me know if that works for you!
(edit: fixed typo in the command)
@csris thanks, I'm no longer getting the error but now I'm getting:
Traceback (most recent call last): File "/mnt/c/Users/[user]/OpenChatKit/tools/convert_to_hf_gptneox.py", line 105, in <module> os.mkdir(args.save_path) FileNotFoundError: [Errno 2] No such file or directory: '/huggingface_models/GPT-NeoXT-Chat-Base-20B'
The command expects GPT-NeoXT-Chat-Base-20B
to be in /huggingface_models/
. Is this supposed to point to pretrained or another directory?
The README has another typo. Run this from the root of the repo:
mkdir huggingface_models \
&& python tools/convert_to_hf_gptneox.py \
--ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5 \
--save-path huggingface_models/GPT-NeoXT-Chat-Base-20B \
--n-stages 8 \
--n-layer-per-stage 6
Also, make sure to update the path to the checkpoint (the --ckpt-path
flag) to point at your desired checkpoint.
@csris is there documentation on the different checkpoints? How do I decide which --ckpt-path
to pick?
@LorrinWWW can give better advice than I can. But I'll do my best:
- The
training/finetune_GPT-NeoXT-Chat-Base-20B.sh
script saves checkpoints to themodel_ckpts/GPT-NeoXT-Chat-Base-20B
directory during training. - The script, by default, writes a checkpoint every 100 steps.
- As the script writes checkpoints, you should see sub-directories named
checkpoint_100
,checkpoint_200
, etc. - If you're training on 8 A100 80GB GPUs, it takes about an hour per checkpoint.
- When in doubt, just pick the most recent checkpoint, the one with the highest number in the directory name.
If you just want to make sure the toolchain is working, you can configure the script to produce a checkpoint every 5 steps, so you don't have to wait an hour. Just change the CHECKPOINT_STEPS
variable on this line to 5.
@davismartens The training script saves a ckpt per CHECKPOINT_STEPS
, so usually you can just pick the latest one :)
@LorrinWWW great thanks. Can I run the pretrained model without training too?
@LorrinWWW great thanks. Can I run the pretrained model without training too?
Sure! You can run our pretrained base model.
@davismartens, would you like to join our Discord server? Here's an invite link: https://discord.gg/9Rk6sSeWEG.
@LorrinWWW thank you. When I run ython inference/bot.py --model togethercomputer/GPT-NeoXT-Chat-Base-20B
I recieve the following error:
Traceback (most recent call last):
File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 7, in <module>
import retrieval.wikipedia as wp
ModuleNotFoundError: No module named 'retrieval'
Any idea why it doesn't work?
@csris joined :)
@davismartens It appears that the bot.py
is unable to locate the retrieval module, which should be present in the root directory of the OpenChatKit
repository.
Could you try running the bot.py
script again while ensuring that you cd
to the correct directory (in your case, /mnt/c/Users/davis/dev-projects/OpenChatKit/
)?
@LorrinWWW retrieval
is present and I'm running from root.
(OpenChatKit) davismartens@LAPTOP-F6477QET:/mnt/c/Users/davis/dev-projects/OpenChatKit$ python inference/bot.py --model togethercomputer/GPT-NeoXT-Chat-Base-20B
Traceback (most recent call last):
File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 7, in <module>
import retrieval.wikipedia as wp
ModuleNotFoundError: No module named 'retrieval'
But for some reason bot.py
doesn't find the module.
@davismartens
Can you try this? export PYTHONPATH=/mnt/c/Users/davis/dev-projects/OpenChatKit:$PYTHONPATH
@LorrinWWW that resolved one issue but now I'm getting this error:
(OpenChatKit) davismartens@LAPTOP-F6477QET:/mnt/c/Users/davis/dev-projects/OpenChatKit$ export PYTHONPATH=/mnt/c/Users/davis/dev-projects/OpenChatKit:$PYTHONPATH
(OpenChatKit) davismartens@LAPTOP-F6477QET:/mnt/c/Users/davis/dev-projects/OpenChatKit$ python inference/bot.py --model togethercomputer/GPT-NeoXT-Chat-Base-20B
Loading togethercomputer/GPT-NeoXT-Chat-Base-20B to cuda:0...
Traceback (most recent call last):
File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/configuration_utils.py", line 616, in _get_config_dict
resolved_config_file = cached_path(
File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/utils/hub.py", line 284, in cached_path
output_path = get_from_cache(
File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/utils/hub.py", line 494, in get_from_cache
raise EnvironmentError("You specified use_auth_token=True, but a huggingface token was not found.")
OSError: You specified use_auth_token=True, but a huggingface token was not found.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 184, in <module>
main()
File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 180, in main
).cmdloop()
File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/cmd.py", line 105, in cmdloop
self.preloop()
File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 63, in preloop
self._model = ChatModel(self._model_name_or_path, self._gpu_id)
File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 21, in __init__
self._model = AutoModelForCausalLM.from_pretrained(
File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 423, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 725, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/configuration_utils.py", line 561, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/configuration_utils.py", line 656, in _get_config_dict
raise EnvironmentError(
OSError: Can't load config for 'togethercomputer/GPT-NeoXT-Chat-Base-20B'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'togethercomputer/GPT-NeoXT-Chat-Base-20B' is the correct path to a directory containing a config.json file
Seems like I need to pass an HF token somewhere?
@davismartens
That's true, we specified use_auth_token=True
..
You can either login HF:
pip install --upgrade huggingface_hub
huggingface-cli login
Or, since togethercomputer/GPT-NeoXT-Chat-Base-20B
is publicly available now, you can simply remove use_auth_token=True
from this line and re-run the inference code.
@LorrinWWW what is the different between default in prepare.py and togethercomputer/GPT-NeoXT-Chat-Base-20B?
@TX-Yeager It shards the ckpt by layer so it is more convenient to do pipeline parallel training. :)