Benchmark script errors on loading InstructBLIP processor

Open lmmx opened this issue 2 years ago • 1 comments

I've tried running the code and found what looks like a bug in the benchmark script, I'm just diagnosing now

The traceback seems to point to the type of the image parameter at line 68:

 53 def bench_captions(
 54     model,
 55     processor,
 56     prompt: str,
 57     images: List[Image.Image],
 58 ) -> List[str]:
 59     total_duration = 0
 60     total_length = 0
 61     model = torch.compile(model)
 62     for image in images:
 63         seconds, text = duration(
 64             lambda: caption(
 65                 model=model,
 66                 processor=processor,
 67                 prompt=prompt,
 68                 image=image,
 69             )
 70         )
 71         total_duration += seconds
 72         total_length += len(text)
 73 
 74     del model
 75     del processor
 76     print(f"Throughput: {total_length/total_duration:.2f} tokens/s")

Click to expand traceback (captured by pytest)

scripts/bench.py:141: in <module>
    bench_captions(
scripts/bench.py:63: in bench_captions
    seconds, text = duration(
scripts/bench.py:48: in duration
    result = callable()
scripts/bench.py:64: in <lambda>
    lambda: caption(
scripts/bench.py:22: in caption
    inputs = processor(prompt, image, return_tensors="pt")
/home/louis/miniconda3/envs/uform/lib/python3.11/site-packages/transformers/models/instructblip/processing_instructblip.py:89: in __call__
    text_encoding = self.tokenizer(
/home/louis/miniconda3/envs/uform/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2802: in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
/home/louis/miniconda3/envs/uform/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2860: in _call_one
    raise ValueError(
E   ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /home/louis/miniconda3/envs/uform/lib/python3.11/site-packages/transformers/tokenization_utils_base.py(2860)_call_one()

I expanded this code out [no lambda] and it still gives the same error but the data flow is clearer

def bench_captions(
    model,
    processor,
    prompt: str,
    images: List[Image.Image],
) -> List[str]:
    total_duration = 0
    total_length = 0
    model = torch.compile(model)

    def caption_image(image, model=model, processor=processor, prompt=prompt):
        return caption(model=model, processor=processor, prompt=prompt, image=image)

    for image in images:
        seconds, text = duration(partial(caption_image, image=image))
        total_duration += seconds
        total_length += len(text)

    del model
    del processor
    print(f"Throughput: {total_length/total_duration:.2f} tokens/s")

The traceback is pointing to the loading of the processor of the InstructBLIP model.

It was reported but not resolved in transformers (I think unrelated https://github.com/huggingface/transformers/issues/21366)

The bug seems to be that we are passing unnamed arguments, and they're getting misused as a result:

        inputs = processor(prompt, image, return_tensors="pt")

The InstructBLIP signature is __call__(self, images, text)

(Pdb) pp self.__call__.__func__.__code__.co_varnames
('self',
 'images',
 'text',
...

The docs say that

The InstructBlipForConditionalGeneration forward method, overrides the __call__ special method.

I think this must be what is supposed to be getting called.

Debugging in PDB shows this is what is happening

(Pdb) p images
'Summarize the visual content of the image.'
(Pdb) p text
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2787x4181 at 0x7FBE4A910090>

Does this reproduce for you?

Cause

Update I found the cause is indeed passing positional args, if you print the processor param names they are, respectively:

texts, images, ... (Uform-gen)
text, images, ... (Llava)
images, text, ... (Instruct-BLIP)

I'm surprised this benchmark was working before

Solution

Since the parameter order varies you can't use positional args, but the parameter names differ too: text/texts.

In fact the odd one out here is from uform itself, so that should change, and this will work.

You can't just pass images=image (InstructBlipProcessor will get multiple values for the argument images)

This cannot be solved by passing text=text to Uform-Gen's VLMProcessor, that leads to a later error in the model.generate step.

It looks like switching the order of these arguments in VLMProcessor is the best solution.

If I patch it, everything works (but that's not to say don't fix the VLMProcessor argument order!).

def caption(model, processor, prompt: str, image: Image.Image) -> str:
    var_names = processor.__call__.__func__.__code__.co_varnames
    prompt_kwarg = next(kw for kw in iter(var_names) if kw.startswith("text"))
    processor_kwargs = {prompt_kwarg: prompt, "images": image, "return_tensors": "pt"}
    inputs = processor(**processor_kwargs)
...

Environment details

OS: Linux
Environment: conda
Python: 3.11.5
Transformers: 4.36.2

Click to show full pip list

(uform) louis 🌟 ~/lab/uform/uform $ pip list
Package            Version    Editable project location
------------------ ---------- ---------------------------
Brotli             1.0.9
certifi            2023.11.17
cffi               1.16.0
charset-normalizer 2.0.4
cryptography       41.0.7
filelock           3.13.1
fsspec             2023.12.2
gmpy2              2.1.2
huggingface-hub    0.20.1
idna               3.4
iniconfig          2.0.0
Jinja2             3.1.2
MarkupSafe         2.1.1
mkl-fft            1.3.8
mkl-random         1.2.4
mkl-service        2.4.0
mpmath             1.3.0
networkx           3.1
numpy              1.26.2
packaging          23.2
Pillow             10.0.1
pip                23.3.1
pluggy             1.3.0
pycparser          2.21
pyOpenSSL          23.2.0
PySocks            1.7.1
pytest             7.4.4
PyYAML             6.0.1
regex              2023.12.25
requests           2.31.0
safetensors        0.4.1
setuptools         68.2.2
sympy              1.12
tokenizers         0.15.0
torch              2.1.2
torchaudio         2.1.2
torchvision        0.16.2
tqdm               4.66.1
transformers       4.36.2
triton             2.1.0
typing_extensions  4.7.1
uform              1.0.3      /home/louis/lab/uform/uform
urllib3            1.26.18
wheel              0.41.2

Click to show full conda list

# packages in environment at /home/louis/miniconda3/envs/uform:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
blas                      1.0                         mkl  
brotli-python             1.0.9           py311h6a678d5_7  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2023.12.12           h06a4308_0  
certifi                   2023.11.17      py311h06a4308_0  
cffi                      1.16.0          py311h5eee18b_0  
charset-normalizer        2.0.4              pyhd3eb1b0_0  
cryptography              41.0.7          py311hdda0065_0  
cuda-cudart               11.8.89                       0    nvidia
cuda-cupti                11.8.87                       0    nvidia
cuda-libraries            11.8.0                        0    nvidia
cuda-nvrtc                11.8.89                       0    nvidia
cuda-nvtx                 11.8.86                       0    nvidia
cuda-runtime              11.8.0                        0    nvidia
ffmpeg                    4.3                  hf484d3e_0    pytorch
filelock                  3.13.1          py311h06a4308_0  
freetype                  2.12.1               h4a9f257_0  
fsspec                    2023.12.2                pypi_0    pypi
giflib                    5.2.1                h5eee18b_3  
gmp                       6.2.1                h295c915_3  
gmpy2                     2.1.2           py311hc9b5ff0_0  
gnutls                    3.6.15               he1e5248_0  
huggingface-hub           0.20.1                   pypi_0    pypi
idna                      3.4             py311h06a4308_0  
iniconfig                 2.0.0                    pypi_0    pypi
intel-openmp              2023.1.0         hdb19cb5_46306  
jinja2                    3.1.2           py311h06a4308_0  
jpeg                      9e                   h5eee18b_1  
lame                      3.100                h7b6447c_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
lerc                      3.0                  h295c915_0  
libcublas                 11.11.3.6                     0    nvidia
libcufft                  10.9.0.58                     0    nvidia
libcufile                 1.8.1.2                       0    nvidia
libcurand                 10.3.4.101                    0    nvidia
libcusolver               11.4.1.48                     0    nvidia
libcusparse               11.7.5.86                     0    nvidia
libdeflate                1.17                 h5eee18b_1  
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libiconv                  1.16                 h7f8727e_2  
libidn2                   2.3.4                h5eee18b_0  
libjpeg-turbo             2.0.0                h9bf148f_0    pytorch
libnpp                    11.8.0.86                     0    nvidia
libnvjpeg                 11.9.0.86                     0    nvidia
libpng                    1.6.39               h5eee18b_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtasn1                  4.19.0               h5eee18b_0  
libtiff                   4.5.1                h6a678d5_0  
libunistring              0.9.10               h27cfd23_0  
libuuid                   1.41.5               h5eee18b_0  
libwebp                   1.3.2                h11a3e52_0  
libwebp-base              1.3.2                h5eee18b_0  
llvm-openmp               14.0.6               h9e868ea_0  
lz4-c                     1.9.4                h6a678d5_0  
markupsafe                2.1.1           py311h5eee18b_0  
mkl                       2023.1.0         h213fc3f_46344  
mkl-service               2.4.0           py311h5eee18b_1  
mkl_fft                   1.3.8           py311h5eee18b_0  
mkl_random                1.2.4           py311hdb19cb5_0  
mpc                       1.1.0                h10f8cd9_1  
mpfr                      4.0.2                hb69a4c5_1  
mpmath                    1.3.0           py311h06a4308_0  
ncurses                   6.4                  h6a678d5_0  
nettle                    3.7.3                hbbd107a_1  
networkx                  3.1             py311h06a4308_0  
numpy                     1.26.2          py311h08b1b3b_0  
numpy-base                1.26.2          py311hf175353_0  
openh264                  2.1.1                h4ff587b_0  
openjpeg                  2.4.0                h3ad879b_0  
openssl                   3.0.12               h7f8727e_0  
packaging                 23.2                     pypi_0    pypi
pillow                    10.0.1          py311ha6cbd5a_0  
pip                       23.3.1          py311h06a4308_0  
pluggy                    1.3.0                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0  
pyopenssl                 23.2.0          py311h06a4308_0  
pysocks                   1.7.1           py311h06a4308_0  
pytest                    7.4.4                    pypi_0    pypi
python                    3.11.5               h955ad1f_0  
pytorch                   2.1.2           py3.11_cuda11.8_cudnn8.7.0_0    pytorch
pytorch-cuda              11.8                 h7e8668a_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pyyaml                    6.0.1           py311h5eee18b_0  
readline                  8.2                  h5eee18b_0  
regex                     2023.12.25               pypi_0    pypi
requests                  2.31.0          py311h06a4308_0  
safetensors               0.4.1                    pypi_0    pypi
setuptools                68.2.2          py311h06a4308_0  
sqlite                    3.41.2               h5eee18b_0  
sympy                     1.12            py311h06a4308_0  
tbb                       2021.8.0             hdb19cb5_0  
tk                        8.6.12               h1ccaba5_0  
tokenizers                0.15.0                   pypi_0    pypi
torchaudio                2.1.2               py311_cu118    pytorch
torchtriton               2.1.0                     py311    pytorch
torchvision               0.16.2              py311_cu118    pytorch
tqdm                      4.66.1                   pypi_0    pypi
transformers              4.36.2                   pypi_0    pypi
typing_extensions         4.7.1           py311h06a4308_0  
tzdata                    2023c                h04d1e81_0  
uform                     1.0.3                    pypi_0    pypi
urllib3                   1.26.18         py311h06a4308_0  
wheel                     0.41.2          py311h06a4308_0  
xz                        5.4.5                h5eee18b_0  
yaml                      0.2.5                h7b6447c_0  
zlib                      1.2.13               h5eee18b_0  
zstd                      1.5.5                hc292b87_0

Jan 02 '24 00:01 lmmx

These are the results I get on 3090, not sure if they're meant to correspond to the table in README or something's changed

UForm-Gen
Throughput: 193.65 tokens/s (run 1)
Throughput: 198.49 tokens/s (run 2)
LLaVA
Throughput: 164.27 tokens/s (run 1)
Throughput: 166.39 tokens/s (run 2)
InstructBLIP
Throughput: 167.85 tokens/s (run 1)
Throughput: 165.90 tokens/s (run 2)
UForm-English
Throughput: 10.68 images/s (run 1)
Throughput: 12.66 images/s (run 2)
Throughput: 202.97 queries/s (run 1)
Throughput: 203.07 queries/s (run 2)
UForm-Multilingual
Throughput: 11.95 images/s (run 1)
Throughput: 12.49 images/s (run 2)
Throughput: 235.77 queries/s (run 1)
Throughput: 240.95 queries/s (run 2)

Jan 02 '24 01:01 lmmx