axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

Adding Phi-3 model

Open monk1337 opened this issue 1 year ago • 2 comments
trafficstars

Adding Phi-3 model

fsdp_transformer_layer_cls_to_wrap: Phi3DecoderLayer

monk1337 avatar Apr 30 '24 18:04 monk1337

Will you also add support for Phi-3 chat template for fine-tuning? As a reference: https://github.com/unslothai/unsloth/blob/4211cc01409e3ced4f7abebaf68e244193b46e2c/unsloth/chat_templates.py#L269C3-L269C8

maziyarpanahi avatar May 09 '24 11:05 maziyarpanahi

will merge this after #1582

winglian avatar May 14 '24 13:05 winglian

Was wondering when would this be merged?

vinamrabenara avatar May 24 '24 21:05 vinamrabenara

Was wondering when would this be merged?

Too busy to work on this :P

monk1337 avatar May 24 '24 21:05 monk1337

@winglian I am having some issues with phi-3-small and phi-3-medium models.

  • phi-3-medium with 4k instruct, it constantly fails with RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
  • phi-3-small with 8k instruct, fails with:
ValueError: For now, we do not support unknown special tokens
In the future, if there is a need for this, we can add special tokens to the tokenizer
starting from rank 100261 - 100263 and then 100266 - 100275.
And finally, we can re-construct the enc object back

Is it correct to assume this PR won't solve these, right?

maziyarpanahi avatar May 28 '24 15:05 maziyarpanahi

@maziyarpanahi yeah, I don't think this PR will resolve that issue.

winglian avatar Jun 04 '24 20:06 winglian

thanks @monk1337 !

winglian avatar Jun 04 '24 20:06 winglian

Thank you @winglian and @monk1337

maziyarpanahi avatar Jun 05 '24 08:06 maziyarpanahi