big_vision
big_vision copied to clipboard
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Hi, @lucasb-eyer thanks for your review and comments. I reformated the files and squashed commits into a new PR (sorry I messed up the old PR and could not squash...
Hi. An AttributeError is raised when running `big_vision/blob/main/big_vision/configs/proj/image_text/lit.ipynb` notebook in colab: P.S: raised here `config.pp_img`. P.S.S: here also will be AttributeError: `config.pp_txt`
This would run pylint with the official Google style configuration on every PR automatically, saving us quite a bit of time. We can [already see results of it in this...
The -m was missing.
Hello, `big_vision` team! Thanks for your work on the repository. I trained FlexiVit-B on a fine-grained dataset CUB-200-2011 using pretrained weights from in21k, on a fixed resolution, say 480r, but...
For a single entry, the tuple is not created. Using a list we can run the demo on a single image/text. i.e ``` images = [PIL.Image.open(fname) for fname in (...
I found an issue here https://github.com/google-research/big_vision/blob/main/big_vision/pp/ops_text.py#L165 When lowering UTF-8 non-latin text `encoding ='utf-8'` should be used as mentioned here https://www.tensorflow.org/api_docs/python/tf/strings/lower . This at least can influence at i18n model. But...
Сould you please clarify if canonicalization had been used during SigLIP **training**? This demo https://github.com/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP_demo.ipynb does not use canonicalization. But canonicalization used in this script https://github.com/google-research/big_vision/blob/main/big_vision/evaluators/proj/image_text/prompt_engineering.py
Hello, `big_vision` team! Thanks for your work on the repository. I found two small typo in the flexivit code: line 194 restored_params = utils.load_params(None, init_file) ==> restored_params = utils.load_params(init_file) line...
Thank you for releasing code for these inspiring works! I tried to use bfloat16 for model parameters, and manually converted images and labels from float32 to bfloat16 before feeding them...