Mengbo-Zhou issues

Results 10 issues of


                                            Mengbo-Zhou

Code complete?

I'd like to know if the code in this repository is complete. Has anyone tried pre-training this model from scratch?

Can int8 in pre-training large model ???

Hello guys! I would like to know if you have experimented with int8 precision in the pre-training of your large models. Can int8 replace fp16 and fp32 to achieve faster...

LoRA + FlashAttention2 speed up？

When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?

LoRA + FlashAttention2 speed up？

When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?

Max Output Tokens of Llama3.1-8B and Llama-4

Hello Meta team, I am wondering what the maximum number of output tokens is for the LLaMA 3.1-8B and Llama-4 model during inference. Also, is there a public document listing...

max output token and knowledge cutoff for Mistral-8B-Instruct-2410

### Python -VV ```shell Hello, I would like to ask two questions about the Mistral-8B-Instruct-2410: 1. **What is the maximum number of output tokens** the model can generate during inference?...

bug

Does it support testing a single PDF file?

Many annual report data are presented in PDF. Does it support testing a single PDF file?

In analyzing this project, I noticed that the current task description uses "correct/incorrect" to describe the model's output. However, this approach is more suited for classification tasks. I am currently...

Does it support testing a single PDF file?

I understand that LongRAG extracts articles from Wikipedia XML dump files and stores them in multiple files, each of which contains multiple documents in XML or JSON format. LongRAG splits...

exceed the model's predefined maximum length (4096)

When I use the Qwen2.5-Math-7B model for inference, I get the following information: This is a friendly reminder - the current text generation call will exceed the model's predefined maximum...