Benchmark script errors on loading InstructBLIP processor
I've tried running the code and found what looks like a bug in the benchmark script, I'm just diagnosing now
The traceback seems to point to the type of the image parameter at line 68:
53 def bench_captions(
54 model,
55 processor,
56 prompt: str,
57 images: List[Image.Image],
58 ) -> List[str]:
59 total_duration = 0
60 total_length = 0
61 model = torch.compile(model)
62 for image in images:
63 seconds, text = duration(
64 lambda: caption(
65 model=model,
66 processor=processor,
67 prompt=prompt,
68 image=image,
69 )
70 )
71 total_duration += seconds
72 total_length += len(text)
73
74 del model
75 del processor
76 print(f"Throughput: {total_length/total_duration:.2f} tokens/s")
Click to expand traceback (captured by pytest)
scripts/bench.py:141: in <module>
bench_captions(
scripts/bench.py:63: in bench_captions
seconds, text = duration(
scripts/bench.py:48: in duration
result = callable()
scripts/bench.py:64: in <lambda>
lambda: caption(
scripts/bench.py:22: in caption
inputs = processor(prompt, image, return_tensors="pt")
/home/louis/miniconda3/envs/uform/lib/python3.11/site-packages/transformers/models/instructblip/processing_instructblip.py:89: in __call__
text_encoding = self.tokenizer(
/home/louis/miniconda3/envs/uform/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2802: in __call__
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
/home/louis/miniconda3/envs/uform/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2860: in _call_one
raise ValueError(
E ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /home/louis/miniconda3/envs/uform/lib/python3.11/site-packages/transformers/tokenization_utils_base.py(2860)_call_one()
I expanded this code out [no lambda] and it still gives the same error but the data flow is clearer
def bench_captions(
model,
processor,
prompt: str,
images: List[Image.Image],
) -> List[str]:
total_duration = 0
total_length = 0
model = torch.compile(model)
def caption_image(image, model=model, processor=processor, prompt=prompt):
return caption(model=model, processor=processor, prompt=prompt, image=image)
for image in images:
seconds, text = duration(partial(caption_image, image=image))
total_duration += seconds
total_length += len(text)
del model
del processor
print(f"Throughput: {total_length/total_duration:.2f} tokens/s")
The traceback is pointing to the loading of the processor of the InstructBLIP model.
It was reported but not resolved in transformers (I think unrelated https://github.com/huggingface/transformers/issues/21366)
The bug seems to be that we are passing unnamed arguments, and they're getting misused as a result:
inputs = processor(prompt, image, return_tensors="pt")
The InstructBLIP signature is __call__(self, images, text)
(Pdb) pp self.__call__.__func__.__code__.co_varnames
('self',
'images',
'text',
...
The docs say that
The InstructBlipForConditionalGeneration forward method, overrides the
__call__special method.
I think this must be what is supposed to be getting called.
Debugging in PDB shows this is what is happening
(Pdb) p images
'Summarize the visual content of the image.'
(Pdb) p text
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2787x4181 at 0x7FBE4A910090>
Does this reproduce for you?
Cause
Update I found the cause is indeed passing positional args, if you print the processor param names they are, respectively:
- texts, images, ... (Uform-gen)
- text, images, ... (Llava)
- images, text, ... (Instruct-BLIP)
I'm surprised this benchmark was working before
Solution
Since the parameter order varies you can't use positional args, but the parameter names differ too: text/texts.
In fact the odd one out here is from uform itself, so that should change, and this will work.
You can't just pass images=image (InstructBlipProcessor will get multiple values for the argument images)
This cannot be solved by passing text=text to Uform-Gen's VLMProcessor, that leads to a later error in the model.generate step.
It looks like switching the order of these arguments in VLMProcessor is the best solution.
If I patch it, everything works (but that's not to say don't fix the VLMProcessor argument order!).
def caption(model, processor, prompt: str, image: Image.Image) -> str:
var_names = processor.__call__.__func__.__code__.co_varnames
prompt_kwarg = next(kw for kw in iter(var_names) if kw.startswith("text"))
processor_kwargs = {prompt_kwarg: prompt, "images": image, "return_tensors": "pt"}
inputs = processor(**processor_kwargs)
...
Environment details
- OS: Linux
- Environment: conda
- Python: 3.11.5
- Transformers: 4.36.2
Click to show full pip list
(uform) louis 🌟 ~/lab/uform/uform $ pip list
Package Version Editable project location
------------------ ---------- ---------------------------
Brotli 1.0.9
certifi 2023.11.17
cffi 1.16.0
charset-normalizer 2.0.4
cryptography 41.0.7
filelock 3.13.1
fsspec 2023.12.2
gmpy2 2.1.2
huggingface-hub 0.20.1
idna 3.4
iniconfig 2.0.0
Jinja2 3.1.2
MarkupSafe 2.1.1
mkl-fft 1.3.8
mkl-random 1.2.4
mkl-service 2.4.0
mpmath 1.3.0
networkx 3.1
numpy 1.26.2
packaging 23.2
Pillow 10.0.1
pip 23.3.1
pluggy 1.3.0
pycparser 2.21
pyOpenSSL 23.2.0
PySocks 1.7.1
pytest 7.4.4
PyYAML 6.0.1
regex 2023.12.25
requests 2.31.0
safetensors 0.4.1
setuptools 68.2.2
sympy 1.12
tokenizers 0.15.0
torch 2.1.2
torchaudio 2.1.2
torchvision 0.16.2
tqdm 4.66.1
transformers 4.36.2
triton 2.1.0
typing_extensions 4.7.1
uform 1.0.3 /home/louis/lab/uform/uform
urllib3 1.26.18
wheel 0.41.2
Click to show full conda list
# packages in environment at /home/louis/miniconda3/envs/uform:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
blas 1.0 mkl
brotli-python 1.0.9 py311h6a678d5_7
bzip2 1.0.8 h7b6447c_0
ca-certificates 2023.12.12 h06a4308_0
certifi 2023.11.17 py311h06a4308_0
cffi 1.16.0 py311h5eee18b_0
charset-normalizer 2.0.4 pyhd3eb1b0_0
cryptography 41.0.7 py311hdda0065_0
cuda-cudart 11.8.89 0 nvidia
cuda-cupti 11.8.87 0 nvidia
cuda-libraries 11.8.0 0 nvidia
cuda-nvrtc 11.8.89 0 nvidia
cuda-nvtx 11.8.86 0 nvidia
cuda-runtime 11.8.0 0 nvidia
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.13.1 py311h06a4308_0
freetype 2.12.1 h4a9f257_0
fsspec 2023.12.2 pypi_0 pypi
giflib 5.2.1 h5eee18b_3
gmp 6.2.1 h295c915_3
gmpy2 2.1.2 py311hc9b5ff0_0
gnutls 3.6.15 he1e5248_0
huggingface-hub 0.20.1 pypi_0 pypi
idna 3.4 py311h06a4308_0
iniconfig 2.0.0 pypi_0 pypi
intel-openmp 2023.1.0 hdb19cb5_46306
jinja2 3.1.2 py311h06a4308_0
jpeg 9e h5eee18b_1
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcublas 11.11.3.6 0 nvidia
libcufft 10.9.0.58 0 nvidia
libcufile 1.8.1.2 0 nvidia
libcurand 10.3.4.101 0 nvidia
libcusolver 11.4.1.48 0 nvidia
libcusparse 11.7.5.86 0 nvidia
libdeflate 1.17 h5eee18b_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.4 h5eee18b_0
libjpeg-turbo 2.0.0 h9bf148f_0 pytorch
libnpp 11.8.0.86 0 nvidia
libnvjpeg 11.9.0.86 0 nvidia
libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.19.0 h5eee18b_0
libtiff 4.5.1 h6a678d5_0
libunistring 0.9.10 h27cfd23_0
libuuid 1.41.5 h5eee18b_0
libwebp 1.3.2 h11a3e52_0
libwebp-base 1.3.2 h5eee18b_0
llvm-openmp 14.0.6 h9e868ea_0
lz4-c 1.9.4 h6a678d5_0
markupsafe 2.1.1 py311h5eee18b_0
mkl 2023.1.0 h213fc3f_46344
mkl-service 2.4.0 py311h5eee18b_1
mkl_fft 1.3.8 py311h5eee18b_0
mkl_random 1.2.4 py311hdb19cb5_0
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.2 hb69a4c5_1
mpmath 1.3.0 py311h06a4308_0
ncurses 6.4 h6a678d5_0
nettle 3.7.3 hbbd107a_1
networkx 3.1 py311h06a4308_0
numpy 1.26.2 py311h08b1b3b_0
numpy-base 1.26.2 py311hf175353_0
openh264 2.1.1 h4ff587b_0
openjpeg 2.4.0 h3ad879b_0
openssl 3.0.12 h7f8727e_0
packaging 23.2 pypi_0 pypi
pillow 10.0.1 py311ha6cbd5a_0
pip 23.3.1 py311h06a4308_0
pluggy 1.3.0 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 23.2.0 py311h06a4308_0
pysocks 1.7.1 py311h06a4308_0
pytest 7.4.4 pypi_0 pypi
python 3.11.5 h955ad1f_0
pytorch 2.1.2 py3.11_cuda11.8_cudnn8.7.0_0 pytorch
pytorch-cuda 11.8 h7e8668a_5 pytorch
pytorch-mutex 1.0 cuda pytorch
pyyaml 6.0.1 py311h5eee18b_0
readline 8.2 h5eee18b_0
regex 2023.12.25 pypi_0 pypi
requests 2.31.0 py311h06a4308_0
safetensors 0.4.1 pypi_0 pypi
setuptools 68.2.2 py311h06a4308_0
sqlite 3.41.2 h5eee18b_0
sympy 1.12 py311h06a4308_0
tbb 2021.8.0 hdb19cb5_0
tk 8.6.12 h1ccaba5_0
tokenizers 0.15.0 pypi_0 pypi
torchaudio 2.1.2 py311_cu118 pytorch
torchtriton 2.1.0 py311 pytorch
torchvision 0.16.2 py311_cu118 pytorch
tqdm 4.66.1 pypi_0 pypi
transformers 4.36.2 pypi_0 pypi
typing_extensions 4.7.1 py311h06a4308_0
tzdata 2023c h04d1e81_0
uform 1.0.3 pypi_0 pypi
urllib3 1.26.18 py311h06a4308_0
wheel 0.41.2 py311h06a4308_0
xz 5.4.5 h5eee18b_0
yaml 0.2.5 h7b6447c_0
zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0
These are the results I get on 3090, not sure if they're meant to correspond to the table in README or something's changed
UForm-Gen
Throughput: 193.65 tokens/s (run 1)
Throughput: 198.49 tokens/s (run 2)
LLaVA
Throughput: 164.27 tokens/s (run 1)
Throughput: 166.39 tokens/s (run 2)
InstructBLIP
Throughput: 167.85 tokens/s (run 1)
Throughput: 165.90 tokens/s (run 2)
UForm-English
Throughput: 10.68 images/s (run 1)
Throughput: 12.66 images/s (run 2)
Throughput: 202.97 queries/s (run 1)
Throughput: 203.07 queries/s (run 2)
UForm-Multilingual
Throughput: 11.95 images/s (run 1)
Throughput: 12.49 images/s (run 2)
Throughput: 235.77 queries/s (run 1)
Throughput: 240.95 queries/s (run 2)