transformers
transformers copied to clipboard
VLM: add more modularity
What does this PR do?
As mentioned in https://github.com/huggingface/transformers/issues/33948, this PR simply refactors code a bit to make it more modular, Specifically we now will have special public methods for obtaining image/video features that users can easily overwrite if they want to modify the process. In any way this makes less code in forward and more standardization in API