transformers
transformers copied to clipboard
Add TF port of BLIP
Work in progress right now, will update this when it's closer to being ready!
The documentation is not available anymore as the PR was closed or merged.
The TF port is mostly complete now and tests are passing locally - I just need to go around updating docs and auto classes and so on. The main code should be ready for review!
Looks like there are many comments to address for now. Please ping me again when it's ready for second review!
Got through a lot of the comments today, but I have a couple of other things to do - will try to finish them tomorrow!
The last remaining big issue is that some of the pt-tf equivalence tests fail when weights don't match up between models. This is caused by the cross-attention weights not being built, presumably because those layers aren't being called in the forward pass. I'm working on figuring out why and resolving that!
The issue seems to be that in all of our other models, cross-attention layers are only added when config.add_cross_attention is True, but in the case of BLIP it only checks config.is_decoder. As a result, the PyTorch models often initialize cross-attention layers that aren't used, which causes weight mismatch issues for us in crossloading tests, because TF only creates weights on first use.
It's coming, don't worry! This cross-attention behaviour is just very odd and I'm trying to track it down first
Hi all! I've addressed all comments and local tests look good. The remaining issues are:
- Converting checkpoints so the tests don't need
from_pt - Maybe adding more auto classes
I'm not sure about the auto classes, though - they're missing in the original PT version of the model as well, so this didn't seem like the right PR to add them.
cc @sgugger - I think this is ready for a final review at last!
Got it, I'll figure out some way to re-enable those tests, or override them with versions that do work!
@sgugger this should be ready for review with all comments addressed! The failing test is in an unrelated model
@sgugger Sorry for the confusion - that equivalence test is present in both the test_modeling_tf_blip and test_modeling_blip file. Do we want to keep it in both?
Yes we do.
Going to leave the pt-to-tf changes in this PR rather than making a separate one, since they're needed for proper BLIP conversion!