safetensors icon indicating copy to clipboard operation
safetensors copied to clipboard

Load incorrect element values in bfloat16 tensor on big-endian

Open kiszk opened this issue 1 year ago • 1 comments
trafficstars

System Info

>>> safetensors.__version__
'0.4.2'

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Reproduction

When I executed transformer models in bfloat16 at HF, I got the incorrect result on s390x. I realized values in weights are different between x86 and s390x. The following is a small reproduction.

Execute the following program on x86

import torch
from safetensors import safe_open
from safetensors.torch import save_file

tensors = {
   "weight1": torch.ones((8, 8), dtype=torch.bfloat16),
}
save_file(tensors, "bf16.safetensors")

read_tensors = {}
with safe_open("bf16.safetensors", framework="pt", device="cpu") as f:
   for key in f.keys():
       read_tensors[key] = f.get_tensor(key)
print(read_tensors)

Copy bf16.safetensors into s390x machine. Then, execute the following program

import torch
from safetensors import safe_open

read_tensors = {}
with safe_open("bf16.safetensors", framework="pt", device="cpu") as f:
   for key in f.keys():
       read_tensors[key] = f.get_tensor(key)
print(read_tensors)

The result is as follows:

{'weight1': tensor([[7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06,
         7.6294e-06, 7.6294e-06],
        [7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06,
         7.6294e-06, 7.6294e-06],
        [7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06,
         7.6294e-06, 7.6294e-06],
        [7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06,
         7.6294e-06, 7.6294e-06],
        [7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06,
         7.6294e-06, 7.6294e-06],
        [7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06,
         7.6294e-06, 7.6294e-06],
        [7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06,
         7.6294e-06, 7.6294e-06],
        [7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06, 7.6294e-06,
         7.6294e-06, 7.6294e-06]], dtype=torch.bfloat16)}

Expected behavior

The result on s390x should be as follows:

{'weight1': tensor([[1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.]], dtype=torch.bfloat16)}

My colleague is curious whether this code works well.

kiszk avatar Mar 05 '24 09:03 kiszk

From PyTorch 2.1, byteswap() function is implemented. Does this function help big-endianness support?

kiszk avatar Mar 05 '24 14:03 kiszk

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Apr 05 '24 01:04 github-actions[bot]

Fixed in https://github.com/huggingface/safetensors/pull/507

Narsil avatar Jul 31 '24 08:07 Narsil