safetensors icon indicating copy to clipboard operation
safetensors copied to clipboard

Fix incorrect serialization given only metadata

Open tommyip opened this issue 10 months ago • 0 comments

What does this PR do?

When saving a safetensors file with some metadata but no tensors, the JSON header is malformed.

from safetensors import safe_open
from safetensors.torch import save_file

save_file({}, 'example.safetensors', {'key': 'value'})

safe_open('example.safetensors', framework='pt')

# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

The serialized data:

$ xxd example.safetensors 
00000000: 2800 0000 0000 0000 7b7d 2c22 5f5f 6d65  (.......{},"__me
00000010: 7461 6461 7461 5f5f 223a 7b22 6b65 7922  tadata__":{"key"
00000020: 3a22 7661 6c75 6522 7d7d 2020 2020 2020  :"value"}}     

The issue comes from calling serde serialize_map with an incorrect number of expected map entries (missing the count for __metadata__). Strange that it serializes correctly if we have both tensors and metadata...

Fixes #466

tommyip avatar Apr 14 '24 16:04 tommyip