Jason Chou issues

Results 13 issues of


                                            Jason Chou

"added by us" placement

It's more common and easier to follow to put the participial phrase after the noun, I think.

`hub_utils.py` assumes a different sharding convention

## 🐛 Bug `hub_utils.py` of baa4e6d840042404e51a60efbe9d65ad62c80fca (current main) assumes a different sharding convention ### To Reproduce 1. Set up everything as before: ```bash $ ls /home/jason_chou/redspot_home/66b/ dict.txt gpt2-vocab.json reshard-model_part-1.pt reshard-model_part-3.pt...

bug

`convert_to_singleton` seems to hang for OPT-66B

#### What is your question? With the directory prepared ``` $ ls 66b/ dict.txt reshard-model_part-0-shard0.pt reshard-model_part-3-shard0.pt reshard-model_part-6-shard0.pt gpt2-merges.txt reshard-model_part-1-shard0.pt reshard-model_part-4-shard0.pt reshard-model_part-7-shard0.pt gpt2-vocab.json reshard-model_part-2-shard0.pt reshard-model_part-5-shard0.pt ``` I had to hack `checkpoint_utils.py`...

question

No longer able to load provided OPT checkpoint after recent changes

## 🐛 Bug No longer able to load provided OPT checkpoint after recent changes ### To Reproduce Edit `metaseq/service/constants.py` as before, in my case: ```python MAX_SEQ_LEN = 2048 BATCH_SIZE =...

bug

Python version requirement more strict than stated

## 🐛 Bug According to https://github.com/facebookresearch/metaseq/blob/c4b33ba6e2cd9b33539bbb5a35d831096bde3282/setup.py#L12-L13 Python >= 3.6 should be compatible. However, since `TypedDict` was introduced by https://github.com/facebookresearch/metaseq/pull/352, metaseq now [effectively requires Python 3.8](https://docs.python.org/3/library/typing.html#typing.TypedDict). Furthermore, since [`torch==1.10.1+cu113` doesn't support...

bug

use MPS and explicitly disable autocast & GradScaler for non-CUDA

`device = "mps"` makes both training and inference ~ an order of magnitude faster on newer Macs, `torch==2.2.0.dev20231002`: Training: [MPS vs. CPU](https://api.wandb.ai/links/eify/y7r3yzwb), RN50 model (username redacted) ``` python3 -m training.main...

deprecate LayerNormFp32

[Modern pytorch (1.10+) always performs LN in fp32](https://huggingface.co/docs/transformers/v4.13.0/en/performance#fp16-inference): > For example, LayerNorm has to be done in fp32 and recent pytorch (1.10+) has been fixed to do that regardless of...

fix contrast()

# Description The mean pixel value should be weighted average of the histogram. [google-research/big_vision](https://github.com/google-research/big_vision) and [tensorflow/tpu](https://github.com/tensorflow/tpu) have the same bug so ideally should be fixed in the same way. ##...

models:official

What's a "`rezied` method"?

https://github.com/google-research/big_vision/blob/b8dab6e4de3436849415f37c591399c93b1eaf39/big_vision/pp/ops_image.py#L172 https://github.com/google-research/big_vision/blob/b8dab6e4de3436849415f37c591399c93b1eaf39/big_vision/pp/ops_image.py#L214 Is it just typo of `resize`?

Behavior of `solarize()` depends on integer overflow

I am not 100% sure about the intention but I do want to raise the alarm. The `solarize()` transform here https://github.com/google-research/big_vision/blob/01edb81a4716f93a48be43b3a4af14e29cdb3a7f/big_vision/pp/autoaugment.py#L180-L184 inverts the pixel when its value is greater or...