scenic
scenic copied to clipboard
Scenic: A Jax Library for Computer Vision Research and Beyond
Updates the checkpoint loading function for ViViT
[bit_dataset] Allow passing the type of image interpolation to perform to the image resizing pre-processing functions.
Hey, I download the pretrained ImageNet21k ViT b_16 model from the urls mentioned in the configuration files, and replace the path in the config file, but for both scenic and...
Internal
The dataset processing is unclear. The readme only shows "Additionally, pre-process the training dataset in the same way as done by the ViViT project [here](https://github.com/google-research/scenic/tree/main/scenic/projects/vivit/data/data.md)." And vivit refers the pre-processing...
Hi, I've implemented OWL-ViT as a fork of [🤗 HuggingFace Transformers](https://github.com/huggingface/transformers.git), and we are planning to add it to the library soon (see https://github.com/huggingface/transformers/pull/17938). Here's a notebook that illustrates inference...
Thanks for your amazing work. Would it be possible to provide an estimate on when the training code would be released?
I try to load _vivit_base_fe_ model using _flax.training_, and find the numbers of layers of SpatialTransformer and TemporalTransformer are **both 12.** However, when I check [vivit_base_factorised_encoder](https://github.com/google-research/scenic/blob/7d1a639c969a7ba03d70af4ee571e65084fe1a2b/scenic/projects/vivit/configs/kinetics400/vivit_base_factorised_encoder.py), I find **config.model.temporal_transformer.num_layers =...
Internal
Internal