30b Checkpoint pickle is published with half precision, no bias tensors and no final layers
The 30b model pickles seem to have no biases.
from tqdm import tqdm
import torch
from pathlib import Path
import pickle
blob_path = Path.home() / Path('.cache/huggingface/hub/models--facebook--galactica-30b/blobs')
keys2blob = {}
errors = {}
blobs = [blob for blob in blob_path.glob('./*') if blob.is_file()]
for blob in tqdm(blobs):
try:
keys2blob.update({k: blob for k in torch.load(blob).keys()})
except pickle.UnpicklingError as e:
errors[blob] = e
print(f"Num_weights: {len([i for i in keys2blob.keys() if 'weight' in i])}")
print(f"Num_biases: {len([i for i in keys2blob.keys() if 'bias' in i])}")
100%|██████████| 12/12 [00:50<00:00, 4.19s/it]
Num_weights: 290
Num_biases: 0
This is opposed to the 6.7b model which contains a lot of biases.
from tqdm import tqdm
import torch
from pathlib import Path
import pickle
blob_path = Path.home() / Path('.cache/huggingface/hub/models--facebook--galactica-6.7b/blobs')
keys2blob = {}
errors = {}
blobs = [blob for blob in blob_path.glob('./*') if blob.is_file()]
for blob in tqdm(blobs):
try:
keys2blob.update({k: blob for k in torch.load(blob).keys()})
except pickle.UnpicklingError as e:
errors[blob] = e
print(f"Num_weights: {len([i for i in keys2blob.keys() if 'weight' in i])}")
print(f"Num_biases: {len([i for i in keys2blob.keys() if 'bias' in i])}")
50%|█████ | 4/8 [00:14<00:14, 3.57s/it]
Num_weights: 260
Num_biases: 257
I do not believe I am missing any pickles because the disk usage of the cloned repository tallies with what is displayed by the huggingface site (note that du outputs gibibytes which is likely the cause of the slight discrepancy in raw numbers).
❯ du -csh ./models--facebook--galactica-30b/blobs/*
785M ./models--facebook--galactica-30b/blobs/0379c39b5a0cb59453b14738ef1d4924e93599aba4e57f2599036e76f36532f6
9.2G ./models--facebook--galactica-30b/blobs/05db345d4fcca580bed2c6e9d0fe8feead207c2c2fa8384c27c94cbd4ed0e0bf
4.0K ./models--facebook--galactica-30b/blobs/0967ef424bce6791893e9a57bb952f80fd536e93
9.2G ./models--facebook--galactica-30b/blobs/0d6ce164b560f4601d48f61c2a8d598106faa9f4b89c39334a712429649b75c8
4.0K ./models--facebook--galactica-30b/blobs/28e11da7e191492f3f23d2aa35e9b60f8e9becf6
9.2G ./models--facebook--galactica-30b/blobs/30a274571d49a30bb4d6872e69b96ad191fa22c92427d160c74ce225a566bd71
24K ./models--facebook--galactica-30b/blobs/98d10d1a52ab2b70f1deff472512cbaa6065e317
9.2G ./models--facebook--galactica-30b/blobs/aa79446f17da0f3b9f8815a3628c2b1935936ec819f09a5865ce4e3c4ee51aa7
9.2G ./models--facebook--galactica-30b/blobs/b919005245e2b77d57bf3a73ac18415083aa32b6e2e4e89c96b8d988453a0e7f
4.0K ./models--facebook--galactica-30b/blobs/bc97f8a9458a1fe096bec5d8ec938a02647bc4bb
9.2G ./models--facebook--galactica-30b/blobs/c1cad10954e544c44aabd29f31e67292d1bc819d2e7b9842f14fdcef88d58f93
2.1M ./models--facebook--galactica-30b/blobs/e18054f92dc016b43c940dd1c4a1c5da884539c0
56G total

Yes, the models use no biases in general and no element-wise affine transformations in layer norms by design. Can you check if the biases are present in the 6.7B checkpoints that we published or if the biases in your checkpoints are non-zero?
All the biases in 6.7B checkpoint are 0
I've checked the other model checkpoints. All of them have bias tensors that are all zero (except 30b which has no bias tensors).
This is the info I have about the checkpoints so far and some notable differences in the 30b checkpoint:
- 30b is the only model saved in half precision
- It does not follow the trend in number of tensors
| Size | Parameters | Disk Usage | Bytes / Parameter ratio | Sum(layer.numels) | Data type of tensors |
|---|---|---|---|---|---|
mini |
125 M | 480M | 4.0265 | 163,430,400 | {torch.float32: 197} |
base |
1.3 B | 5.0G | 4.1298 | 1,417,601,024 | {torch.float32: 389} |
standard |
6.7 B | 26G | 4.1667 | 6,862,159,872 | {torch.float32: 517} |
large |
30 B | 56G | 2.0043 | 29,968,103,424 | {torch.float16: 290} |
huge |
120 B | 453G | 4.0534 | 121,853,747,200 | {torch.float32: 1541} |
The checkpoint is now fixed as part of https://huggingface.co/facebook/galactica-30b/discussions/6. All the checkpoints are now fully compatible with OPT architecture, use float16 weights, with layer norm weights set to ones and all the biases set to zero.