OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

Set from_tf=True (Problems installing)

Open andrewfr opened this issue 1 year ago • 9 comments

❓ The question

I am new to OLMo. I am not a PyTorch person. I am installing OLMo from Git (I would like to try fine tuning). I created a virtual environment. I ran the code snippet to get the weights. I get :

Unable to load weights from pytorch checkpoint file for ... If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

What causes this problem? Where do I set this value?

Thanks in advance, Andrew

andrewfr avatar Feb 02 '24 04:02 andrewfr

Hey @andrewfr could you please provide a code snippet to reproduce along with the output of pip freeze?

epwalsh avatar Feb 02 '24 16:02 epwalsh

Hi Pete:

Thank for the help. I've included the pip freeze: requirements.txt

The snippet is:

import hf_olmo from transformers import pipeline olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B") print(olmo_pipe("Language modeling is "))

I ran the code. I apologize for not noticing all of the error. I guess the obvious question is how much memory do I need to run the model? The machine I'm using has a NVIDIA card but I can't expand the memory past 16G. Also I am still interested knowing where do I set this flag? I haven't found a concise explanation of what it does.

Thanks, Andrew

` Traceback (most recent call last): File "/home/andrew/lab/allen/OLMo/x.py", line 3, in olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/lab/allen/venv/lib/python3.11/site-packages/transformers/pipelines/init.py", line 870, in pipeline framework, model = infer_framework_load_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/lab/allen/venv/lib/python3.11/site-packages/transformers/pipelines/base.py", line 291, in infer_framework_load_model raise ValueError( ValueError: Could not load model allenai/OLMo-7B with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>,). See the original errors:

while loading with AutoModelForCausalLM, an error is thrown: Traceback (most recent call last): File "/home/andrew/allen/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 531, in load_state_dict return torch.load( ^^^^^^^^^^^ File "/home/andrew/allen/venv/lib/python3.11/site-packages/torch/serialization.py", line 1004, in load overall_storage = torch.UntypedStorage.from_file(f, False, size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: unable to mmap 27552427238 bytes from file </home/andrew/.cache/huggingface/hub/models--allenai--OLMo-7B/snapshots/8f565105afc536ae8e654d18a265cfc81bb3c63d/pytorch_model.bin>: Cannot allocate memory (12)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/andrew/allen/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 540, in load_state_dict if f.read(7) == "version": ^^^^^^^^^ File "", line 322, in decode UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/andrew/allen/venv/lib/python3.11/site-packages/transformers/pipelines/base.py", line 278, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/allen/venv/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/allen/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3525, in from_pretrained state_dict = load_state_dict(resolved_archive_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/andrew/allen/venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 552, in load_state_dict raise OSError( OSError: Unable to load weights from pytorch checkpoint file for '/home/andrew/.cache/huggingface/hub/models--allenai--OLMo-7B/snapshots/8f565105afc536ae8e654d18a265cfc81bb3c63d/pytorch_model.bin' at '/home/andrew/.cache/huggingface/hub/models--allenai--OLMo-7B/snapshots/8f565105afc536ae8e654d18a265cfc81bb3c63d/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.`

andrewfr avatar Feb 02 '24 19:02 andrewfr

Hey @andrewfr, the 7B model requires about 27.6GB of GPU memory just to load it, and more to actually run inference. So you'd probably need a 40GB GPU. However you should be able to run the 1B model without any issues.

epwalsh avatar Feb 02 '24 19:02 epwalsh

Hi Pete:

Thanks for the answer! How do I get the smaller 1G model (I'll look in the meanwhile)? My small machine uses a Nvidia GeForce GTX 1070 (8G?). Really I want to fine-tune the model for something really specific so this may do.

Again, thanks for the help!

Cheers, Andrew

andrewfr avatar Feb 02 '24 19:02 andrewfr

In your example just change "allenai/OLMo-7B" to "allenai/OLMo-1B"

epwalsh avatar Feb 02 '24 19:02 epwalsh

Hi Pete:

I did this. The system installed. I get the following warning:

`/home/andrew/allen/venv/lib/python3.11/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() tokenizer_config.json: 100%

...

/home/andrew/allen/venv/lib/python3.11/site-packages/transformers/generation/utils.py:1133: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation. warnings.warn( [{'generated_text': 'Language modeling is \n- [x] \n- [x] \n- [x'}]

I'll assume I can run pytest to see if everything is working?

Again, thanks for the help!

Cheers, Andrew

andrewfr avatar Feb 02 '24 20:02 andrewfr

Hi Pete:

Perhaps this something different. I am reading up on lzma (and _lzma):

Running pytest platform linux -- Python 3.11.1, pytest-8.0.0, pluggy-1.4.0 rootdir: /home/andrew/allen/OLMo configfile: pyproject.toml testpaths: tests/ plugins: sphinx-0.5.0 collected 119 items / 1 error

================================================================ ERRORS ================================================================ ____________________________________________ ERROR collecting tests/eval/downstream_test.py ____________________________________________ ImportError while importing test module '/home/andrew/allen/OLMo/tests/eval/downstream_test.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: /usr/local/lib/python3.11/importlib/init.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/eval/downstream_test.py:4: in from olmo.eval import build_downstream_evaluator olmo/eval/init.py:11: in from .downstream import ICLMetric, label_to_task_map olmo/eval/downstream.py:5: in import datasets ../venv/lib/python3.11/site-packages/datasets/init.py:22: in from .arrow_dataset import Dataset ../venv/lib/python3.11/site-packages/datasets/arrow_dataset.py:66: in from .arrow_reader import ArrowReader ../venv/lib/python3.11/site-packages/datasets/arrow_reader.py:30: in from .download.download_config import DownloadConfig ../venv/lib/python3.11/site-packages/datasets/download/init.py:9: in from .download_manager import DownloadManager, DownloadMode ../venv/lib/python3.11/site-packages/datasets/download/download_manager.py:33: in from ..utils.file_utils import cached_path, get_from_cache, hash_url_to_filename, is_relative_path, url_or_path_join ../venv/lib/python3.11/site-packages/datasets/utils/file_utils.py:37: in from .extract import ExtractManager ../venv/lib/python3.11/site-packages/datasets/utils/extract.py:3: in import lzma /usr/local/lib/python3.11/lzma.py:27: in from _lzma import * E ModuleNotFoundError: No module named '_lzma' =========================================================== warnings summary =========================================================== ../venv/lib/python3.11/site-packages/huggingface_hub/inference/_text_generation.py:121 /home/andrew/allen/venv/lib/python3.11/site-packages/huggingface_hub/inference/_text_generation.py:121: PydanticDeprecatedSince20: Pydantic V1 style @validator validators are deprecated. You should migrate to Pydantic V2 style @field_validator validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/ @validator("best_of")

(I omitted the other entries)

andrewfr avatar Feb 02 '24 20:02 andrewfr

@andrewfr it's okay to ignore those warnings. I'm not sure about that error. I'm guessing an update to datasets broke that loading script.

epwalsh avatar Feb 02 '24 20:02 epwalsh

Hi Pete:

Re-compiling Python to include lzma would be a hassle.

I ran the HuggingFace inference example to see if things are working. The program feels slow but it runs.

Thanks! Andrew

andrewfr avatar Feb 03 '24 14:02 andrewfr

I apologize for our delay in response. In order to help surface current, unresolved issues, we are closing tickets prior to February 29. Please reopen your ticket if you are continuing to experience this issue. Thank you!

dumitrac avatar Apr 30 '24 18:04 dumitrac