unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

unsloth-internLM 2.5

Open rezzie-rich opened this issue 1 year ago • 19 comments

can we please get official support for internLM-2.5?

I have seen a closed issue regarding that #734. however, the model mentioned there might be broken as it fails to load for instance.

It would be great to get an official version from you guys since the model has a lot of potential due to its size and context window.

additional question: does llamafing a model pose any licensing restriction from llama? if so it would be hugely appreciated if the supported internLM is not restricted by any llama licensing agreement.

rezzie-rich avatar Jul 14 '24 01:07 rezzie-rich

Llamafying it won't cause license issues since it's just a re-arrangement of modules. I'll try but best to llama-fy it for now

danielhanchen avatar Jul 18 '24 07:07 danielhanchen

thank you, looking forward to it. if possible then the 1M context version :D if not 200k will work too.

their benchmark shows it performs best up to 200k context window before losing some quality.

rezzie-rich avatar Jul 18 '24 07:07 rezzie-rich

will it be too much to ask to request for a commercial usage license for internLM from the creator for the unsloth version? they offer it for free upon request. If you guys obtain it then it becomes easier for anyone using the unsloth version of internLM without needing to request that again.

rezzie-rich avatar Jul 22 '24 22:07 rezzie-rich

Apologies for the delay - hmm I think its the engineer themselves (ie yourself) who has to request it - we can request it for our own use, but unsure on distributing it through ourselves

danielhanchen avatar Jul 26 '24 06:07 danielhanchen

we can request it for our own use, but unsure on distributing it through ourselves

Maybe that can be confirmed during the request since llamafying it will make it a different model, architecturally.

rezzie-rich avatar Jul 26 '24 07:07 rezzie-rich

I have llamafied InternLM2.5-7B, and tried to open it in Unsloth.

I get

/usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py in LlamaAttention__init__(self, config, layer_idx)

ValueError: Unknown RoPE scaling type dynamic


ethanc8 avatar Jul 28 '24 05:07 ethanc8

This model has

  "rope_scaling": {
    "factor": 2.0,
    "type": "dynamic"
  },

in its config.json

ethanc8 avatar Jul 28 '24 05:07 ethanc8

Open LLM Leaderboard also seems to be having trouble with its dynamic rope_scaling: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/862

ethanc8 avatar Jul 28 '24 05:07 ethanc8

In the other closed issue, you mentioned that RoPE scaling can be disabled in order to finetune the model. I will try that.

ethanc8 avatar Jul 28 '24 14:07 ethanc8

Wait what's "dynamic" RoPE scaling - the only accepted ones are linear rope scaling, NTK, YARN, llama-3 type etc

danielhanchen avatar Jul 31 '24 03:07 danielhanchen

I actually have no idea, will probably need to read the internlm remote code: https://huggingface.co/internlm/internlm2_5-7b/blob/main/modeling_internlm2.py

ethanc8 avatar Jul 31 '24 03:07 ethanc8

good news: if unsloth makes an internLM version ( both bf16 and int4 ), it can be released under the Apache-2.0 license since the original model comes under the Apache-2.0 license for research purposes. Anyone using the unsloth version will then only be bound by that model's license since they aren't using the original model.

It would be highly appreciated if a 200k context window version of the model is released by unsloth. This model has the best needle benchmark score compared to any other.

rezzie-rich avatar Aug 03 '24 21:08 rezzie-rich

The original model is nor Apache-2.0 for research purposes, only the inference code is. However, models are probably not copyrightable in the US. The best way to get it licensed under Apache-2.0 is to ask for a license.

ethanc8 avatar Aug 03 '24 21:08 ethanc8

The original model is nor Apache-2.0 for research purposes, only the inference code is. However, models are probably not copyrightable in the US. The best way to get it licensed under Apache-2.0 is to ask for a license.

it's from the model card: Open Source License The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected].

rezzie-rich avatar Aug 03 '24 23:08 rezzie-rich

The original model is nor Apache-2.0 for research purposes, only the inference code is. However, models are probably not copyrightable in the US. The best way to get it licensed under Apache-2.0 is to ask for a license.

it's from the model card: Open Source License The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected].

That explicitly says that only the code is under Apache-2.0. The model weights are available under an unspecified license which prohibits commercial use, and you can get a commercial license by applying with the application form.

ethanc8 avatar Aug 04 '24 00:08 ethanc8

It clearly states that the license requirement is only for commercial use. Otherwise, it's open under Apache

rezzie-rich avatar Aug 04 '24 01:08 rezzie-rich

It clearly says that only the code is licensed under Apache-2.0. Anyways, it would be best to contact them, as they have not revealed the details of their public license.

ethanc8 avatar Aug 04 '24 01:08 ethanc8

what is the state of this? This issue seems to be repeated

https://github.com/unslothai/unsloth/issues/1007

brando90 avatar Sep 18 '24 19:09 brando90

https://github.com/unslothai/unsloth/issues/734

brando90 avatar Sep 18 '24 19:09 brando90