ccdv-ai comments

Results 22 comments of


                                            ccdv-ai

trafficstars

Fail to load a tokenizer (CroissantLLM)

@danielhanchen Almost! ```python from unsloth.chat_templates import get_chat_template tokenizer = get_chat_template( tokenizer, chat_template = "chatml", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth mapping = {"role" : "from", "content"...

Fail to load a tokenizer (CroissantLLM)

@danielhanchen Looks like this issue also happens with llama-3. Conversational notebooks cannot be run currently if the tokenizer is BPE/sentencepiece. ``` --> 222 tokenizer_file.ParseFromString(open(f"{temporary_location}/tokenizer.model", "rb").read()) ```

Fail to load a tokenizer (CroissantLLM)

@danielhanchen yes, looks like its fixed! thank you

problem with xlm_roberta

Hi @puppetm4st3r Should be fixed with the last release `pip install lsg-converter --upgrade`

Convert DeBERTa to longDeBERTa

Hi @duyvuleo Currently, converting DeBERTa to Long DeBERTa is not possible because this model uses on a specific attention mecanism called "disentangled attention" which relies on different inputs + relative...

Convert llama to Long llama

Hi @jakebonk HF team added the Llama model few days ago. From what I see in this [implementation](https://github.com/huggingface/transformers/blob/28f26c107b4a1c5c7e32ed4d9575622da0627a40/src/transformers/models/llama/modeling_llama.py#L94) it is likely possible to add the LSG attention to a Llama...

Convert T5 models to Long T5

hi @dafraile T5 is planned somehow, but there are some caveats: * T5 relies on a relative positional embedding. It is added to the attention score matrix directly, so you...

Phi-3 conversation format, example training script and perplexity metric

@winglian @brianfitzgerald [FastChat](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py) doesn't have the `phi_3` template so it fails with sharegpt. I didn't find a way to train phi3 with a conversation dataset. We should rely on the...

DeepSpeed Zero3 is Incompatible with Freeze Range Code

@winglian Got the same problem with a stage 1 config. Unfreezing an entire layer doesn't work (no gradient). No problem with fsdp

Support loading a local hf dataset with `load_dataset`

> Can you recommend a way to reproduce this? What files should I be putting into the directory? Thanks! Here a way to reproduce @winglian : `git clone https://huggingface.co/datasets/mhenrichsen/alpaca_2k_test` Modify...