Zhuosheng Zhang

Results 24 comments of Zhuosheng Zhang

The hanging may also be reasonable as the main process could be handling the data after loading the model (there is no signal for indicating the completion of model loading).

Please try the latest version. It should have fixed the problem.

Not sure about that. However we did see that when using a T5-style encoder-decoder model, a larger model achieves better performance. Due to the resource limit, we did not scale...

It is weird. The codes are adapted from https://github.com/salesforce/LAVIS/tree/main/projects/instructblip. You can check the instructions here.

This issue may be due to the update of the transformers library. The solution above seems to be effective.

It is an initial T5 model. I did not find obvious performance gains by using the finetuned first stage T5 model.

Hi, thanks for your interest! An efficient way could be training your framework just in two steps like MM-CoT: (i) rationale generation; (ii) answer inference; no matter the backbone modules...

Please try the latest version. It should work well.

Thanks for the revision! That's cool.