transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add TF port of BLIP

Open Rocketknight1 opened this issue 2 years ago • 2 comments

Work in progress right now, will update this when it's closer to being ready!

Rocketknight1 avatar Mar 10 '23 17:03 Rocketknight1

The documentation is not available anymore as the PR was closed or merged.

The TF port is mostly complete now and tests are passing locally - I just need to go around updating docs and auto classes and so on. The main code should be ready for review!

Rocketknight1 avatar Mar 24 '23 14:03 Rocketknight1

Looks like there are many comments to address for now. Please ping me again when it's ready for second review!

sgugger avatar Mar 27 '23 13:03 sgugger

Got through a lot of the comments today, but I have a couple of other things to do - will try to finish them tomorrow!

Rocketknight1 avatar Mar 27 '23 16:03 Rocketknight1

The last remaining big issue is that some of the pt-tf equivalence tests fail when weights don't match up between models. This is caused by the cross-attention weights not being built, presumably because those layers aren't being called in the forward pass. I'm working on figuring out why and resolving that!

Rocketknight1 avatar Mar 28 '23 18:03 Rocketknight1

The issue seems to be that in all of our other models, cross-attention layers are only added when config.add_cross_attention is True, but in the case of BLIP it only checks config.is_decoder. As a result, the PyTorch models often initialize cross-attention layers that aren't used, which causes weight mismatch issues for us in crossloading tests, because TF only creates weights on first use.

Rocketknight1 avatar Mar 28 '23 18:03 Rocketknight1

It's coming, don't worry! This cross-attention behaviour is just very odd and I'm trying to track it down first

Rocketknight1 avatar Mar 29 '23 13:03 Rocketknight1

Hi all! I've addressed all comments and local tests look good. The remaining issues are:

  • Converting checkpoints so the tests don't need from_pt
  • Maybe adding more auto classes

I'm not sure about the auto classes, though - they're missing in the original PT version of the model as well, so this didn't seem like the right PR to add them.

Rocketknight1 avatar Mar 30 '23 15:03 Rocketknight1

cc @sgugger - I think this is ready for a final review at last!

Rocketknight1 avatar Mar 30 '23 15:03 Rocketknight1

Got it, I'll figure out some way to re-enable those tests, or override them with versions that do work!

Rocketknight1 avatar Mar 31 '23 14:03 Rocketknight1

@sgugger this should be ready for review with all comments addressed! The failing test is in an unrelated model

Rocketknight1 avatar Apr 03 '23 19:04 Rocketknight1

@sgugger Sorry for the confusion - that equivalence test is present in both the test_modeling_tf_blip and test_modeling_blip file. Do we want to keep it in both?

Rocketknight1 avatar Apr 03 '23 22:04 Rocketknight1

Yes we do.

sgugger avatar Apr 04 '23 01:04 sgugger

Going to leave the pt-to-tf changes in this PR rather than making a separate one, since they're needed for proper BLIP conversion!

Rocketknight1 avatar Apr 04 '23 14:04 Rocketknight1