Alaa El-Nouby

Results 13 comments of Alaa El-Nouby

I am sorry, I didn't compute the IS for this implementation. If you have computed them using this implementation, please send a pull request with the IS you got. Thanks...

Since we are using the pre-computed text embeddings, this code does not support text as input. Currently you can only use the examples in the given datasets.

Check this repo https://github.com/reedscot/icml2016 I am not actually sure if everything you need is there, but it seems the caption embeddings and the image data links are there as well.

Try to change This `W, H = cfg.multi_scale_out_size[size_index]` To that `W, H = cfg.out_size `

Thanks for your question. ImageBind learns a shared embeddings space across modalities, therefore it allows retrieval across modalities. If by conversion you mean generation, ImageBind features can be fed to...

@lahfir Do you happen to be using a mac and python 3.9 ? I think the decord is built and published for mac only up to python 3.8 as detailed...

Thanks for your contribution. Could you please set the default ``cache dir=".checkpoints/imagebind_huge.pth"`` such that the current behaviour does not change ?

Thanks for your question. Unlike other modalities, Vision logits are not scaled by a temperature: https://github.com/facebookresearch/ImageBind/blob/0f8620b6678fd24c35f172721ea6046ab5780890/models/imagebind_model.py#L432 If we look at the cosine similarity for Vision x Vision (so dropping the...

We will work on releasing smaller checkpoints in the coming couple of weeks.

Thanks for your question. Third part dependencies refers to other python packages that need to be installed to run the code. (e.g. pytorchvideo, torchaudio, einops). The list of all required...