ccdv-ai

Results 22 comments of ccdv-ai
trafficstars

@danielhanchen Almost! ```python from unsloth.chat_templates import get_chat_template tokenizer = get_chat_template( tokenizer, chat_template = "chatml", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth mapping = {"role" : "from", "content"...

@danielhanchen Looks like this issue also happens with llama-3. Conversational notebooks cannot be run currently if the tokenizer is BPE/sentencepiece. ``` --> 222 tokenizer_file.ParseFromString(open(f"{temporary_location}/tokenizer.model", "rb").read()) ```

@danielhanchen yes, looks like its fixed! thank you

Hi @puppetm4st3r Should be fixed with the last release `pip install lsg-converter --upgrade`

Hi @duyvuleo Currently, converting DeBERTa to Long DeBERTa is not possible because this model uses on a specific attention mecanism called "disentangled attention" which relies on different inputs + relative...

Hi @jakebonk HF team added the Llama model few days ago. From what I see in this [implementation](https://github.com/huggingface/transformers/blob/28f26c107b4a1c5c7e32ed4d9575622da0627a40/src/transformers/models/llama/modeling_llama.py#L94) it is likely possible to add the LSG attention to a Llama...

hi @dafraile T5 is planned somehow, but there are some caveats: * T5 relies on a relative positional embedding. It is added to the attention score matrix directly, so you...

@winglian @brianfitzgerald [FastChat](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py) doesn't have the `phi_3` template so it fails with sharegpt. I didn't find a way to train phi3 with a conversation dataset. We should rely on the...

@winglian Got the same problem with a stage 1 config. Unfreezing an entire layer doesn't work (no gradient). No problem with fsdp

> Can you recommend a way to reproduce this? What files should I be putting into the directory? Thanks! Here a way to reproduce @winglian : `git clone https://huggingface.co/datasets/mhenrichsen/alpaca_2k_test` Modify...