[WIP] Add small, large, 3b, 11b pre-trained weights for t5

Open pmabbo13 opened this issue 3 years ago • 0 comments

Description

Add t5 bundlers for small, large, 3b, and 11b configuration

Process

Upload pre-trained weights and create bundler objects for each configuration. There _path attributes point to the corresponding checkpoint uploaded to our bucket.
the base model projected the query, keys, and values to dimension embed_dim / num_heads. The 3b and 11b models break from this convention and have specified dimensions that these tensors get projected to. Therefore, we introduce a new parameter qkv_dim in T5Conf and all our t5 modules so that this projection dimension can be taken into account. In order to do this, we also have to introduce a new method T5MultiheadAttention._t5_in_projection. This is a modified version to torch.nn.function._in_projection. This change was necessary because torch.nn.functional._in_projection expects that the query, keys, and values all get projected to the dimension embed_dim/num_heads, which is an assumption that is no longer true with the 3b and 11b configurations.

Testing

Add integration tests for small and large. 3b and 11b have large checkpoints (12GB and 47GB) each taking a long time to load (see below). In order to keep the CI tests time efficient, we have not checked in their unittests, though we have tested them locally to ensure they are currently performing as expected.

pytest test/prototype/integration_tests/test_models.py

small testing takes 24.67s base testing takes 74.72s (0:01:14) large testing takes 251.09s (0:04:11) 3b testing takes 1034.32s (0:17:14) 11b testing takes 0s (0:0:0) (TBD)

Aug 11 '22 20:08 pmabbo13

text text copied to clipboard

[WIP] Add small, large, 3b, 11b pre-trained weights for t5

Description

Process

Testing

text
text copied to clipboard