Results 4 issues of Wentao Zhu

The dataset processing is unclear. The readme only shows "Additionally, pre-process the training dataset in the same way as done by the ViViT project [here](https://github.com/google-research/scenic/tree/main/scenic/projects/vivit/data/data.md)." And vivit refers the pre-processing...

When I am using map_coordinates and generated warp deformation field to get the transformed image, I find the tranformed image and that by NiftyReg are different. Could you please provide...

Thank you so much for the code! It is pretty useful! Could you please also open source the retrieval training based on BLIP2? Any help is greatly appreciated.

跑一下代码,发现无法达到0.78+ 的性能。请您指点一下。谢谢!