scenic issues

Updates the checkpoint loading function for ViViT

copybara-service[bot]

[bit_dataset] Allow passing the type of image interpolation to perform to the image resizing pre-processing functions.

copybara-service[bot]

ViT ImageNet Pretrained Model Problem with ViViT

9

Hey, I download the pretrained ImageNet21k ViT b_16 model from the urls mentioned in the configuration files, and replace the path in the config file, but for both scenic and...

pabdzadeh

projects

Internal

copybara-service[bot]

How to prepare the data for projects/mbt?

11

The dataset processing is unclear. The readme only shows "Additionally, pre-process the training dataset in the same way as done by the ViViT project [here](https://github.com/google-research/scenic/tree/main/scenic/projects/vivit/data/data.md)." And vivit refers the pre-processing...

wentaozhu

Adding OWL-ViT to HuggingFace Transformers

20

Hi, I've implemented OWL-ViT as a fork of [🤗 HuggingFace Transformers](https://github.com/huggingface/transformers.git), and we are planning to add it to the library soon (see https://github.com/huggingface/transformers/pull/17938). Here's a notebook that illustrates inference...

alaradirik

When can we expect the training code for OWL-ViT?

2

Thanks for your amazing work. Would it be possible to provide an estimate on when the training code would be released?

JosephKJ

Num_layers of temporal transformer in checkpoint don't match?

I try to load _vivit_base_fe_ model using _flax.training_, and find the numbers of layers of SpatialTransformer and TemporalTransformer are **both 12.** However, when I check [vivit_base_factorised_encoder](https://github.com/google-research/scenic/blob/7d1a639c969a7ba03d70af4ee571e65084fe1a2b/scenic/projects/vivit/configs/kinetics400/vivit_base_factorised_encoder.py), I find **config.model.temporal_transformer.num_layers =...

realgump