LLaVA icon indicating copy to clipboard operation
LLaVA copied to clipboard

add llama3's prompt template to conversation.py

Open KazutoshiShinoda opened this issue 10 months ago • 4 comments

KazutoshiShinoda avatar Apr 22 '24 15:04 KazutoshiShinoda

#1426

KazutoshiShinoda avatar Apr 22 '24 15:04 KazutoshiShinoda

@KazutoshiShinoda can add preprocess_llama_3 func code? i will test it on prepare stage

awzhgw avatar Apr 23 '24 07:04 awzhgw

what about prepocess during LazySupervisedDataset

Jayantverma2 avatar Apr 23 '24 13:04 Jayantverma2

Hi @KazutoshiShinoda, @awzhgw , @Jayantverma2 ,

I hope you are doing well. We have just released our project LLaVA++: Extending Visual Capabilities with LLaMA-3 and Phi-3, which features LLaMA-3 and Phi-3-Mini based LLaVA models. Please have a look at this at LLaVA++.

  • We have released the codes required to support both LLaMA-3 & Phi-3-Mini models in LLaVA framework. The chat formats and corresponding preprocess methods are available at our GitHub repo.
  • We released all the checkpoints on Hugging Face
  • On our GitHub repository we have provided .py files that needs to be replaced/added to official LLaVA repository to train and infer LLaMA-3 & Phi-3-Mini based models.

I hope this would be helpful. Please let me know if you have any questions. Thanks

mmaaz60 avatar Apr 26 '24 18:04 mmaaz60

@mmaaz60 In your implementation, I can see the following logic for preprocessing, but I don't quite understand why
round_len -= 1 when i > 0. Could you explain that a little bit?

    for conversation, target in zip(conversations, targets):
        total_len = int(target.ne(tokenizer.pad_token_id).sum())

        rounds = conversation.split(conv.sep)
        re_rounds = [conv.sep.join(rounds[:3])]
        for conv_idx in range(3, len(rounds), 2):
            re_rounds.append(conv.sep.join(rounds[conv_idx:conv_idx + 2]))
        cur_len = 0
        target[:cur_len] = IGNORE_INDEX
        for i, rou in enumerate(re_rounds):
            if rou == "":
                break

            parts = rou.split(sep)
            if len(parts) != 2:
                break
            parts[0] += sep

            if has_image:
                round_len = len(tokenizer_image_token(rou, tokenizer)) + 1
                instruction_len = len(tokenizer_image_token(parts[0], tokenizer))
            else:
                round_len = len(tokenizer(rou).input_ids) + 1
                instruction_len = len(tokenizer(parts[0]).input_ids)

            if i > 0:
                round_len -= 1
                instruction_len -= 1

            target[cur_len: cur_len + instruction_len] = IGNORE_INDEX

            cur_len += round_len

pluswcm avatar Jun 05 '24 09:06 pluswcm