llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

Could not find image processor class in the image processor config or the model config - 'NoneType' object is not callable

Open himat opened this issue 1 year ago • 4 comments

This is my entire script

import faiss 
from gpt_index import GPTFaissIndex, SimpleDirectoryReader

emb_dim_size = 1536
faiss_index = faiss.IndexFlatL2(emb_dim_size)

documents = SimpleDirectoryReader("/Users/hima/Documents/X/Z").load_data()

And it fails on the last line with

Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 documents = SimpleDirectoryReader("/Users/hima/Documents/X/Z").load_data()

File ~/.pyenv/versions/3.10.4/lib/python3.10/site-packages/gpt_index/readers/file.py:219, in SimpleDirectoryReader.load_data(self, concatenate)
    217 for input_file in self.input_files:
    218     if input_file.suffix in self.file_extractor:
--> 219         data = self.file_extractor[input_file.suffix](input_file, self.errors)
    220     else:
    221         # do standard read
    222         with open(input_file, "r", errors=self.errors) as f:

File ~/.pyenv/versions/3.10.4/lib/python3.10/site-packages/gpt_index/readers/file.py:65, in _image_parser(input_file, errors)
     62 except ImportError:
     63     raise ValueError("PIL is required to read image files.")
---> 65 processor = DonutProcessor.from_pretrained(
     66     "naver-clova-ix/donut-base-finetuned-cord-v2"
     67 )
     68 model = VisionEncoderDecoderModel.from_pretrained(
     69     "naver-clova-ix/donut-base-finetuned-cord-v2"
     70 )
     72 device = "cuda" if torch.cuda.is_available() else "cpu"

File ~/.pyenv/versions/3.10.4/lib/python3.10/site-packages/transformers/processing_utils.py:183, in ProcessorMixin.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    153 @classmethod
    154 def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
    155     r"""
    156     Instantiate a processor associated with a pretrained model.
    157 
   (...)
    181             [`~tokenization_utils_base.PreTrainedTokenizer.from_pretrained`].
    182     """
--> 183     args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
    184     return cls(*args)

File ~/.pyenv/versions/3.10.4/lib/python3.10/site-packages/transformers/processing_utils.py:227, in ProcessorMixin._get_arguments_from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    224     else:
    225         attribute_class = getattr(transformers_module, class_name)
--> 227     args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
    228 return args

File ~/.pyenv/versions/3.10.4/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py:352, in AutoImageProcessor.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    349     else:
    350         image_processor_class = image_processor_class_from_name(image_processor_class)
--> 352     return image_processor_class.from_dict(config_dict, **kwargs)
    353 # Last try: we use the IMAGE_PROCESSOR_MAPPING.
    354 elif type(config) in IMAGE_PROCESSOR_MAPPING:

TypeError: 'NoneType' object is not callable

Why is this happening, and how should I fix this error?

Separately, one strange I don't understand about this repo is how you're using libraries like sentencepiece, and yet nowhere in this repo is sentencepiece written down as a dependency which I found interesting, since when I tried to run the above code, I also ran into a couple of errors about missing imports so I had to install sentencepiece, torch, and other ones. I'm not sure if my error above is related to needing to install these?

himat avatar Jan 18 '23 01:01 himat

Hey @himat, thanks for surfacing. Do you have images in your directory? It seems like GPT Index is trying to use the image processor.

We lazily load libraries like sentencepiece to keep the overall package small - they're not required for you to get up and running using GPT Index, they're only used for processing image files. I could add this to a list of "optional dependencies" that can be installable if that would help, but in the meantime if you do want to use our image processing you can manually pip install the dependencies.

jerryjliu avatar Jan 18 '23 02:01 jerryjliu

Ah yes I do have images in it

himat avatar Jan 18 '23 02:01 himat

If I have images in the directory though, is there something else I have to do? Do I need a different package installed or how do I fix the error?

himat avatar Jan 18 '23 02:01 himat

If I have images in the directory though, is there something else I have to do? Do I need a different package i

@himat if you didn't want the SimpleDirectoryReader to process these images, you can specify "required_exts" when initializing SimpleDirectoryReader e.g. SimpleDirectoryReader('folder', required_exts=[".txt"])

jerryjliu avatar Jan 18 '23 18:01 jerryjliu

@himat going to close the issue for now. If you have more questions please join the discord! https://discord.gg/dGcwcsnxhU

jerryjliu avatar Feb 20 '23 03:02 jerryjliu