VLM: add more modularity

Open zucchini-nlp opened this issue 4 months ago • 1 comments

What does this PR do?

As mentioned in https://github.com/huggingface/transformers/issues/33948, this PR simply refactors code a bit to make it more modular, Specifically we now will have special public methods for obtaining image/video features that users can easily overwrite if they want to modify the process. In any way this makes less code in forward and more standardization in API

Oct 15 '24 14:10 zucchini-nlp

transformers transformers copied to clipboard

VLM: add more modularity

What does this PR do?

transformers
transformers copied to clipboard