mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again

Open Quang-elec44 opened this issue 7 months ago • 2 comments

Hi, I am trying to merge two models following this post. Here is my config:

yaml_config = """
slices:
  - sources:
    - model: vilm/vinallama-2.7b-chat
      layer_range: [0, 32]
  - sources:
    - model: vilm/vinallama-2.7b-chat
      layer_range: [0, 32]
merge_method: passthrough
dtype: bfloat16
"""

Here is the error log:

RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.layers.32.input_layernorm.weight', 'model.layers.0.input_layernorm.weight'}, {'model.layers.0.mlp.down_proj.weight' ...
A potential way to correctly save your model is to use `save_model`.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

The author of this model successfully merged the 7B model using the same method (The model is here). Am I doing anything wrong ?

Quang-elec44 avatar Jan 08 '24 08:01 Quang-elec44

Hi, I am trying to merge two models following this post. Here is my config:

yaml_config = """
slices:
  - sources:
    - model: vilm/vinallama-2.7b-chat
      layer_range: [0, 32]
  - sources:
    - model: vilm/vinallama-2.7b-chat
      layer_range: [0, 3]
merge_method: passthrough
dtype: bfloat16
"""

Here is the error log:

RuntimeError: 
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'model.layers.32.input_layernorm.weight', 'model.layers.0.input_layernorm.weight'}, {'model.layers.0.mlp.down_proj.weight' ...
A potential way to correctly save your model is to use `save_model`.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

The author of this model successfully merged the 7B model using the same method (The model is here). Am I doing anything wrong ?

@cg123 Don't know if this is the right approach(I'm on colab) but I deleted the following piece of code on /usr/local/lib/python3.10/dist-packages/safetensors/torch.py

    if failing:
        raise RuntimeError(
            f"""
            Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: {failing}.
            A potential way to correctly save your model is to use `save_model`.
            More information at https://huggingface.co/docs/safetensors/torch_shared_tensors
            """
        )

Seems to work

Ar57m avatar Jan 08 '24 14:01 Ar57m

You can solve this error by using the --clone-tensors argument to the script. It's off by default because it uses a little bit more memory, but it's here for this use case exactly. Hope this helps!

cg123 avatar Jan 09 '24 06:01 cg123

@cg123 I set clone_tensors=True in MergeOptions class and still got the same error

image

Quang-elec44 avatar Jan 10 '24 03:01 Quang-elec44

@cg123 I set clone_tensors=True in MergeOptions class and still got the same error

image

seems that you wrote clone_tensor it's clone_tensors

Ar57m avatar Jan 10 '24 03:01 Ar57m

@Ar57m Oh my bad. Thanks for your notice.

Quang-elec44 avatar Jan 10 '24 03:01 Quang-elec44

@cg123 @Ar57m It worked. Thanks for your help !!!

Quang-elec44 avatar Jan 10 '24 03:01 Quang-elec44