diffusers
diffusers copied to clipboard
Error creating pre-trained FluxPipeline with diffusers "Cannot instantiate this tokenizer from a slow version"
Describe the bug
I'm receiving an error trying to use the diffusers module to run the flux model.
ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed
https://huggingface.co/black-forest-labs/FLUX.1-schnell
Reproduction
- Have a system with an NVIDIA GPU (ie. GeForce RTX 2080)
- Install Docker Desktop on Windows 11
- Run a new PyTorch container
docker run --rm --interactive --tty --gpus=all pytorch/pytorch - Install git package:
apt update; apt install git --yes; - Install Python dependencies:
pip install transformers accelerate git+https://github.com/huggingface/diffusers.git - Run the code sample from this repository
import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt,
guidance_scale=0.0,
output_type="pil",
num_inference_steps=4,
max_sequence_length=256,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-schnell.png")
Logs
>>> pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
Loading pipeline components...: 57%|██████████████████████████████████████████████████████████████████████████▊ | 4/7 [00:00<00:00, 16.49it/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 876, in from_pretrained
loaded_sub_model = load_sub_model(
File "/opt/conda/lib/python3.10/site-packages/diffusers/pipelines/pipeline_loading_utils.py", line 700, in load_sub_model
loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2291, in from_pretrained
return cls._from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2525, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5_fast.py", line 119, in __init__
super().__init__(
File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 106, in __init__
raise ValueError(
ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.
### System Info
- 🤗 Diffusers version: 0.30.0.dev0
- Platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.13
- PyTorch version (GPU?): 2.2.1 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.24.5
- Transformers version: 4.43.3
- Accelerate version: 0.33.0
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.4.3
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 2080, 8192 MiB
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
### Who can help?
_No response_
try pip install sentencepiece ?
try pip install sentencepiece ?
That was the first thing I did, but it had zero effect.
I have the same issue, tried both pip install sentencepiece and pip install transformers[sentencepiece] and no change.
are you able to run it this way?
import torch
from diffusers import FluxPipeline
from transformers import CLIPTokenizer
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16)
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell",tokenizer=tokenizer, torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt,
guidance_scale=0.0,
output_type="pil",
num_inference_steps=4,
max_sequence_length=256,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-schnell.png")
That gives me the same error. In fact the way I have been trying to do it is:
from diffusers import FluxPipeline, AutoencoderKL
from diffusers.image_processor import VaeImageProcessor
from transformers import T5EncoderModel, T5TokenizerFast, CLIPTokenizer, CLIPTextModel
import torch
ckpt_id = "black-forest-labs/FLUX.1-schnell"
denoise_pipeline = FluxPipeline.from_pretrained(
ckpt_id,
text_encoder=None,
text_encoder_2=None,
tokenizer=None,
tokenizer_2=None,
vae=None,
torch_dtype=torch.bfloat16,
).to("cpu")
prompt_pipeline = FluxPipeline.from_pretrained(
ckpt_id,
text_encoder=CLIPTextModel.from_pretrained(ckpt_id, subfolder="text_encoder", torch_dtype=torch.bfloat16),
text_encoder_2=T5EncoderModel.from_pretrained(ckpt_id, subfolder="text_encoder_2", torch_dtype=torch.bfloat16),
tokenizer=CLIPTokenizer.from_pretrained(ckpt_id, subfolder="tokenizer"),
tokenizer_2=T5TokenizerFast.from_pretrained(ckpt_id, subfolder="tokenizer_2"),
transformer=None,
vae=None,
).to("cpu")
vae = AutoencoderKL.from_pretrained(ckpt_id, subfolder="vae", torch_dtype=torch.bfloat16).to("cpu")
vae_scale_factor = 2 ** (len(vae.config.block_out_channels))
image_processor = VaeImageProcessor(vae_scale_factor=vae_scale_factor)
and the error is apparently caused by the T5 tokenizer:
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[6], line 18
1 ckpt_id = "black-forest-labs[/FLUX.1-schnell](http://localhost:8888/FLUX.1-schnell)"
3 denoise_pipeline = FluxPipeline.from_pretrained(
4 ckpt_id,
5 text_encoder=None,
(...)
10 torch_dtype=torch.bfloat16,
11 ).to("cpu")
13 prompt_pipeline = FluxPipeline.from_pretrained(
14 ckpt_id,
15 text_encoder=CLIPTextModel.from_pretrained(ckpt_id, subfolder="text_encoder", torch_dtype=torch.bfloat16),
16 text_encoder_2=T5EncoderModel.from_pretrained(ckpt_id, subfolder="text_encoder_2", torch_dtype=torch.bfloat16),
17 tokenizer=CLIPTokenizer.from_pretrained(ckpt_id, subfolder="tokenizer"),
---> 18 tokenizer_2=T5TokenizerFast.from_pretrained(ckpt_id, subfolder="tokenizer_2"),
19 transformer=None,
20 vae=None,
21 ).to("cpu")
23 vae = AutoencoderKL.from_pretrained(ckpt_id, subfolder="vae", torch_dtype=torch.bfloat16).to("cpu")
24 vae_scale_factor = 2 ** (len(vae.config.block_out_channels))
File ~/notebooks/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2291, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, trust_remote_code, *init_inputs, **kwargs)
2288 else:
2289 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
-> 2291 return cls._from_pretrained(
2292 resolved_vocab_files,
2293 pretrained_model_name_or_path,
2294 init_configuration,
2295 *init_inputs,
2296 token=token,
2297 cache_dir=cache_dir,
2298 local_files_only=local_files_only,
2299 _commit_hash=commit_hash,
2300 _is_local=is_local,
2301 trust_remote_code=trust_remote_code,
2302 **kwargs,
2303 )
File ~/notebooks/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2525, in PreTrainedTokenizerBase._from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, token, cache_dir, local_files_only, _commit_hash, _is_local, trust_remote_code, *init_inputs, **kwargs)
2523 # Instantiate the tokenizer.
2524 try:
-> 2525 tokenizer = cls(*init_inputs, **init_kwargs)
2526 except OSError:
2527 raise OSError(
2528 "Unable to load vocabulary from file. "
2529 "Please check that the provided vocabulary is accessible and not corrupted."
2530 )
File ~/notebooks/.venv/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5_fast.py:119, in T5TokenizerFast.__init__(self, vocab_file, tokenizer_file, eos_token, unk_token, pad_token, extra_ids, additional_special_tokens, add_prefix_space, **kwargs)
114 logger.warning_once(
115 "You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers"
116 )
117 kwargs["from_slow"] = True
--> 119 super().__init__(
120 vocab_file,
121 tokenizer_file=tokenizer_file,
122 eos_token=eos_token,
123 unk_token=unk_token,
124 pad_token=pad_token,
125 extra_ids=extra_ids,
126 additional_special_tokens=additional_special_tokens,
127 **kwargs,
128 )
130 self.vocab_file = vocab_file
131 self._extra_ids = extra_ids
File ~/notebooks/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:106, in PreTrainedTokenizerFast.__init__(self, *args, **kwargs)
103 added_tokens_decoder = kwargs.pop("added_tokens_decoder", {})
105 if from_slow and slow_tokenizer is None and self.slow_tokenizer_class is None:
--> 106 raise ValueError(
107 "Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you "
108 "have sentencepiece installed."
109 )
111 if tokenizer_object is not None:
112 fast_tokenizer = copy.deepcopy(tokenizer_object)
ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.
do you have sentencepice in your environment, e.g. does this work?
import sentencepiece
pip install protobuf
My issue turned out to be the way I was running jupyter in a virtualenv. Thanks for your help
@pcgeek86 were you also able to resolve the issue?
@pcgeek86 were you also able to resolve the issue?
I am getting CUDA out of memory errors, but I think it's working with your code sample. Still seems like maybe a bug in the original code sample I shared, from the link?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
My issue turned out to be the way I was running jupyter in a virtualenv. Thanks for your help
@jbaron34 what tokenizer worked in the end. Trying to run Flex1 in a huggingface space with no luck yet
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
pip install sentencepiece
it worked
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Doesn't work for me, I have sentencepiece, transformers and protobuf installed
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
maybe the issue is people are trying to use latest python, 3.13..whereas at least sentencepiece is only compatible ith 3.11?
My issue turned out to be the way I was running jupyter in a virtualenv. Thanks for your help
Try restart the kernel. It worked for me.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
My issue turned out to be the way I was running jupyter in a virtualenv. Thanks for your help
Try restart the kernel. It worked for me.
I solve the problem by installing dependencies and restarting the kernel. Thanks!
Closing this since it appears to be an environment issue rather than a Diffusers issue.