Mengbo-Zhou

Results 10 issues of Mengbo-Zhou

I'd like to know if the code in this repository is complete. Has anyone tried pre-training this model from scratch?

Hello guys! I would like to know if you have experimented with int8 precision in the pre-training of your large models. Can int8 replace fp16 and fp32 to achieve faster...

When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?

When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?

Hello Meta team, I am wondering what the maximum number of output tokens is for the LLaMA 3.1-8B and Llama-4 model during inference. Also, is there a public document listing...

### Python -VV ```shell Hello, I would like to ask two questions about the Mistral-8B-Instruct-2410: 1. **What is the maximum number of output tokens** the model can generate during inference?...

bug

Many annual report data are presented in PDF. Does it support testing a single PDF file?

In analyzing this project, I noticed that the current task description uses "correct/incorrect" to describe the model's output. However, this approach is more suited for classification tasks. I am currently...

I understand that LongRAG extracts articles from Wikipedia XML dump files and stores them in multiple files, each of which contains multiple documents in XML or JSON format. LongRAG splits...

When I use the Qwen2.5-Math-7B model for inference, I get the following information: This is a friendly reminder - the current text generation call will exceed the model's predefined maximum...