transformers
transformers copied to clipboard
`get_imports` failing to respect conditionals on imports
System Info
transformersversion: 4.36.2- Platform: macOS-13.5.2-arm64-arm-64bit
- Python version: 3.11.7
- Huggingface_hub version: 0.20.2
- Safetensors version: 0.4.1
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.2 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: no
- Using distributed or parallel set-up in script?: no
Who can help?
From git blame: @Wauplin @sgugger
From issue template (it's a LLM): @ArthurZucker @you
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
Running the below snippet on a MacBook without an Nvidia GPU and transformers==4.36.2 will throw an ImportError to pip install flash_attn. However, flash_attn isn't actually a requirement for this model, so something's off here.
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
Leads to:
File "/Users/user/code/project/venv/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 315, in get_cached_module_file
modules_needed = check_imports(resolved_module_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/code/project/venv/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 180, in check_imports
raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn`
python-BaseException
Investigating this, it seems https://github.com/huggingface/transformers/blob/v4.36.2/src/transformers/dynamic_module_utils.py#L154 is picking up flash_attn from https://github.com/huggingface/transformers/blob/v4.36.2/src/transformers/models/phi/modeling_phi.py#L50-L52. However, if you look at the file, it's within an if statement.
Therein is the bug, that transformers.dynamic_module_utils.get_imports is not respecting conditionals before imports.
Please see https://huggingface.co/microsoft/phi-1_5/discussions/72 for more info.
Expected behavior
My goal is some way to avoid monkey patching get_imports to remove the extra inferred flash_attn dependency.
The most generalized solution is probably moving get_imports from regex searching the source to either use inspect (see here) or some other AST walking method. I am pretty sure there is a simple fix here, it just involves moving away from a regex.
For reference, this only happens when trust_remote_code=True. Thus, we switched from using if is_flash_attn_2_available(): to a try/except block when trying to import the flash_attn package.
Seems to be working!
Thanks @gugarosa for finding a workaround, that works because get_imports includes a special regex for try-except: https://github.com/huggingface/transformers/blob/v4.36.2/src/transformers/dynamic_module_utils.py#L149.
To share, adding the below case to https://github.com/huggingface/transformers/blob/v4.36.2/tests/utils/test_dynamic_module_utils.py will expose the issue:
...
TOP_LEVEL_CONDITIONAL_IMPORT = """
import os
if False:
import pathlib
"""
...
CASES = [
...,
TOP_LEVEL_CONDITIONAL_IMPORT
]
Looking at the other test cases, to properly fix this bug, I am now thinking it will involve use of ast as shown in https://stackoverflow.com/a/42195575
Note a generalized importer should also be able to take into account contextlib.suppress:
import contextlib
with contextlib.suppress(ImportError):
from flash_attn import flash_attn_func
same problem for deepseeker moe [https://github.com/deepseek-ai/DeepSeek-MoE](deepseeker moe)
Fixed by
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_name = "/root/models/deepseek-moe-16b-base"
# model_name = "/root/models/Llama-2-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# With Python 3.11.7, transformers==4.36.2
import os
from unittest.mock import patch
from transformers import AutoModelForCausalLM
from transformers.dynamic_module_utils import get_imports
def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
"""Work around for https://huggingface.co/microsoft/phi-1_5/discussions/72."""
if not str(filename).endswith("/modeling_deepseek.py"):
return get_imports(filename)
imports = get_imports(filename)
imports.remove("flash_attn")
return imports
with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports):
# model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id
text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
```
I think all custom models ( which need trust_remote_code=True) trigger this problem
(^ ping @LysandreJik about the trust_remote_code mechanism?)
(^ ping @LysandreJik about the
trust_remote_codemechanism?)
Yes, the code is here:
Yep have already heard of such feedback! Would you like to open a PR for a fix?
Yep have already heard of such feedback! Would you like to open a PR for a fix?
Of course, I will make a PR for fix
Yep have already heard of such feedback! Would you like to open a PR for a fix? Her is my pull request. https://github.com/huggingface/transformers/pull/28811
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I think a fix here would be useful @github-actions, so let's keep it open
Looks like the issue has been fixed. I'm able to load the model without flash_attn installed.
% python3
Python 3.12.3 (main, Apr 9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from transformers import AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
>>> import flash_attn
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'flash_attn'
Thanks for sharing @jla524! cc @Rocketknight1 as I think you were working on a related issue?
Hi guys, I encountered the same error for version 4.41.2.
I am confused which package dose flash_attn refered to?
I have tried install both of the two packages (xformers and https://github.com/Dao-AILab/flash-attention), the problems still exists.
Hi @congchan,
flash_attn refers to the package you linked to, i.e the one installed when running pip install flash-attn, however you may need to follow the specific installation instructions given your setup.
If flash attention is properly installed, you should be able to run python -c "import flash_attn; print(flash_attn.__version__)" and see the installed version. If running on cuda, you'll need version 2.1 or above to run a lot of the modeling code
Am on 4.41 , still got ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn
I can not use flash_attn, but the modelding should support both of thme.
Am using Florence2 modeling.
@lucasjinreal Without full environment info (run transformers-cli env in the terminal and copy-paste the output) and a reproducible code snippet, we won't be able to help you
hey @amyeroberts, I am having this same issue running Florence 2 on Mac. env info:
- `transformers` version: 4.41.2
- Platform: macOS-14.1-arm64-arm-64bit
- Python version: 3.12.4
- Huggingface_hub version: 0.23.4
- Safetensors version: 0.4.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.3.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: no
- Using distributed or parallel set-up in script?: no
script in case it helps (copy/pasted from Florence 2 tutorial):
from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
import requests
import copy
import matplotlib.pyplot as plt
import matplotlib.patches as patches
# Load model and processor
device = 'cpu'
model_id = 'microsoft/Florence-2-large-ft'
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True).eval().to(device)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
# Define the prediction function
def run_example(task_prompt, text_input=None):
if text_input is None:
prompt = task_prompt
else:
prompt = task_prompt + text_input
inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
input_ids=inputs["input_ids"].to(device),
pixel_values=inputs["pixel_values"].to(device),
max_new_tokens=1024,
early_stopping=False,
do_sample=False,
num_beams=3,
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(
generated_text,
task=task_prompt,
image_size=(image.width, image.height)
)
return parsed_answer
# Initialize the image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)
# Run pre-defined tasks without additional inputs
task_prompt = '<CAPTION>'
print(run_example(task_prompt))
task_prompt = '<DETAILED_CAPTION>'
print(run_example(task_prompt))
task_prompt = '<MORE_DETAILED_CAPTION>'
print(run_example(task_prompt))
# Object detection
task_prompt = '<OD>'
results = run_example(task_prompt)
print(results)
def plot_bbox(image, data):
fig, ax = plt.subplots()
ax.imshow(image)
for bbox, label in zip(data['bboxes'], data['labels']):
x1, y1, x2, y2 = bbox
rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=1, edgecolor='r', facecolor='none')
ax.add_patch(rect)
plt.text(x1, y1, label, color='white', fontsize=8, bbox=dict(facecolor='red', alpha=0.5))
ax.axis('off')
plt.show()
plot_bbox(image, results['<OD>'])
Hi @derickmr, thanks for sharing this snippet and env information!
As there's a few different behaviours being reported, I just want to confirm the issue you're experiencing: is it that the snippet does not run if flash attention isn't installed in the environment (conditions are respected when running from the hub); or is it that even when installing flash attention, you're prompted to install it?