lynn issues

Results 5 issues of


                                            lynn

AssertionError: Padding_idx must be within num_embeddings, demo.py

**Describe the bug** during the running of demo.py, when i upload the picture, and chat with the model, it raised error ```bash PS D:\Courses\gitHub Projects\MiniGPT4main> python demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id...

Is mamba slower than transformer?

GPU: A100 Mamba config: using the default MambaConfig except vocab_size set to 108192 CUDA: 12.1 Pytorch:2.3.1 python:3.11 I trained a two tower Bert with about 230m parameter in total, with...

[QUESTION]How to calculate MFU based on the flops?

when I train Qwen2.5-32B with Megatron, I found the throughput was 420 or so using H200x2 to train, the partion were tp=4,pp=2, so according to the mfu calculation, the util...

What's the difference among the different mode when initialize model from pretrained model?

```python def init_model_from_pretrained( pretrained_model: FlexBertModel, new_model: FlexBertModel, mode: Union[str, TileMode] = TileMode.tile_weights_from_middle, ): ``` I notice the method take mode as an argument, and it has three kinds of mode,`'center_weights',...

Qwen3-30B-A3B OOM with GRPO on 4x8H200 141G

With the max_prompt_length = 4096, max_response_length=8192, tp=4,pp=2,ep=2, gpu_utilization=0.65, the script confronted OOM issue. I can train it when cpt, why grpo failed