OpenChatKit icon indicating copy to clipboard operation
OpenChatKit copied to clipboard

Issue Converting Weights to Huggingface Format

Open davismartens opened this issue 1 year ago • 18 comments

I'm trying to convert the weights as per the example but running into an issue.

After mkdir huggingface_models \ && python tools/convert_to_hf_gptneox.py \ --ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5 --save-path /huggingface_models/GPT-NeoXT-Chat-Base-20B --n-stages 8 --n-layer-per-stage 6

I'm getting this error: Traceback (most recent call last): File "/mnt/c/Users/name/OpenChatKit/tools/convert_to_hf_gptneox.py", line 102, in <module> assert args.save_path is not None AssertionError --save-path: command not found --n-stages: command not found --n-layer-per-stage: command not found

I'm using Windows 11 WSL Ubuntu 22.04.2 LTS

davismartens avatar Mar 12 '23 20:03 davismartens

That's a typo in the README. I'll put a fix up in a moment. It should be:

mkdir huggingface_models \ 
  && python tools/convert_to_hf_gptneox.py \ 
       --ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5  \
       --save-path huggingface_models/GPT-NeoXT-Chat-Base-20B  \
       --n-stages 8  \
       --n-layer-per-stage 6

Note all the backslashes.

Let me know if that works for you!

(edit: fixed typo in the command)

csris avatar Mar 12 '23 22:03 csris

@csris thanks, I'm no longer getting the error but now I'm getting:

Traceback (most recent call last): File "/mnt/c/Users/[user]/OpenChatKit/tools/convert_to_hf_gptneox.py", line 105, in <module> os.mkdir(args.save_path) FileNotFoundError: [Errno 2] No such file or directory: '/huggingface_models/GPT-NeoXT-Chat-Base-20B'

The command expects GPT-NeoXT-Chat-Base-20B to be in /huggingface_models/. Is this supposed to point to pretrained or another directory?

davismartens avatar Mar 12 '23 23:03 davismartens

The README has another typo. Run this from the root of the repo:

mkdir huggingface_models \ 
  && python tools/convert_to_hf_gptneox.py \ 
       --ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5  \
       --save-path huggingface_models/GPT-NeoXT-Chat-Base-20B  \
       --n-stages 8  \
       --n-layer-per-stage 6

csris avatar Mar 12 '23 23:03 csris

Also, make sure to update the path to the checkpoint (the --ckpt-path flag) to point at your desired checkpoint.

csris avatar Mar 12 '23 23:03 csris

@csris is there documentation on the different checkpoints? How do I decide which --ckpt-path to pick?

davismartens avatar Mar 13 '23 00:03 davismartens

@LorrinWWW can give better advice than I can. But I'll do my best:

  • The training/finetune_GPT-NeoXT-Chat-Base-20B.sh script saves checkpoints to the model_ckpts/GPT-NeoXT-Chat-Base-20B directory during training.
  • The script, by default, writes a checkpoint every 100 steps.
  • As the script writes checkpoints, you should see sub-directories named checkpoint_100, checkpoint_200, etc.
  • If you're training on 8 A100 80GB GPUs, it takes about an hour per checkpoint.
  • When in doubt, just pick the most recent checkpoint, the one with the highest number in the directory name.

If you just want to make sure the toolchain is working, you can configure the script to produce a checkpoint every 5 steps, so you don't have to wait an hour. Just change the CHECKPOINT_STEPS variable on this line to 5.

csris avatar Mar 13 '23 01:03 csris

@davismartens The training script saves a ckpt per CHECKPOINT_STEPS, so usually you can just pick the latest one :)

LorrinWWW avatar Mar 13 '23 01:03 LorrinWWW

@LorrinWWW great thanks. Can I run the pretrained model without training too?

davismartens avatar Mar 13 '23 02:03 davismartens

@LorrinWWW great thanks. Can I run the pretrained model without training too?

Sure! You can run our pretrained base model.

LorrinWWW avatar Mar 13 '23 02:03 LorrinWWW

@davismartens, would you like to join our Discord server? Here's an invite link: https://discord.gg/9Rk6sSeWEG.

csris avatar Mar 13 '23 02:03 csris

@LorrinWWW thank you. When I run ython inference/bot.py --model togethercomputer/GPT-NeoXT-Chat-Base-20B I recieve the following error:

Traceback (most recent call last):
  File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 7, in <module>
    import retrieval.wikipedia as wp
ModuleNotFoundError: No module named 'retrieval'

Any idea why it doesn't work?

@csris joined :)

davismartens avatar Mar 13 '23 10:03 davismartens

@davismartens It appears that the bot.py is unable to locate the retrieval module, which should be present in the root directory of the OpenChatKit repository.

Could you try running the bot.py script again while ensuring that you cd to the correct directory (in your case, /mnt/c/Users/davis/dev-projects/OpenChatKit/)?

LorrinWWW avatar Mar 13 '23 11:03 LorrinWWW

@LorrinWWW retrieval is present and I'm running from root.

(OpenChatKit) davismartens@LAPTOP-F6477QET:/mnt/c/Users/davis/dev-projects/OpenChatKit$ python inference/bot.py --model togethercomputer/GPT-NeoXT-Chat-Base-20B
Traceback (most recent call last):
  File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 7, in <module>
    import retrieval.wikipedia as wp
ModuleNotFoundError: No module named 'retrieval'

image

But for some reason bot.py doesn't find the module.

davismartens avatar Mar 13 '23 11:03 davismartens

@davismartens Can you try this? export PYTHONPATH=/mnt/c/Users/davis/dev-projects/OpenChatKit:$PYTHONPATH

LorrinWWW avatar Mar 13 '23 11:03 LorrinWWW

@LorrinWWW that resolved one issue but now I'm getting this error:

(OpenChatKit) davismartens@LAPTOP-F6477QET:/mnt/c/Users/davis/dev-projects/OpenChatKit$ export PYTHONPATH=/mnt/c/Users/davis/dev-projects/OpenChatKit:$PYTHONPATH
(OpenChatKit) davismartens@LAPTOP-F6477QET:/mnt/c/Users/davis/dev-projects/OpenChatKit$ python inference/bot.py --model togethercomputer/GPT-NeoXT-Chat-Base-20B
Loading togethercomputer/GPT-NeoXT-Chat-Base-20B to cuda:0...
Traceback (most recent call last):
  File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/configuration_utils.py", line 616, in _get_config_dict
    resolved_config_file = cached_path(
  File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/utils/hub.py", line 284, in cached_path
    output_path = get_from_cache(
  File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/utils/hub.py", line 494, in get_from_cache
    raise EnvironmentError("You specified use_auth_token=True, but a huggingface token was not found.")
OSError: You specified use_auth_token=True, but a huggingface token was not found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 184, in <module>
    main()
  File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 180, in main
    ).cmdloop()
  File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/cmd.py", line 105, in cmdloop
    self.preloop()
  File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 63, in preloop
    self._model = ChatModel(self._model_name_or_path, self._gpu_id)
  File "/mnt/c/Users/davis/dev-projects/OpenChatKit/inference/bot.py", line 21, in __init__
    self._model = AutoModelForCausalLM.from_pretrained(
  File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 423, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 725, in from_pretrained
    config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/configuration_utils.py", line 561, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/davismartens/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/configuration_utils.py", line 656, in _get_config_dict
    raise EnvironmentError(
OSError: Can't load config for 'togethercomputer/GPT-NeoXT-Chat-Base-20B'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'togethercomputer/GPT-NeoXT-Chat-Base-20B' is the correct path to a directory containing a config.json file

Seems like I need to pass an HF token somewhere?

davismartens avatar Mar 13 '23 11:03 davismartens

@davismartens That's true, we specified use_auth_token=True..

You can either login HF:

pip install --upgrade huggingface_hub
huggingface-cli login

Or, since togethercomputer/GPT-NeoXT-Chat-Base-20B is publicly available now, you can simply remove use_auth_token=True from this line and re-run the inference code.

LorrinWWW avatar Mar 13 '23 12:03 LorrinWWW

@LorrinWWW what is the different between default in prepare.py and togethercomputer/GPT-NeoXT-Chat-Base-20B?

TX-Yeager avatar Mar 15 '23 03:03 TX-Yeager

@TX-Yeager It shards the ckpt by layer so it is more convenient to do pipeline parallel training. :)

LorrinWWW avatar Mar 15 '23 13:03 LorrinWWW