dumpmemory comments

Results 51 comments of


                                            dumpmemory

Mixtral 8x7B full finetune with DS zero3: Assertion error

> > I have faced hang issues after 1:30 hours training time wiht ft and zero3 > > same question u can try to update nccl 2.19.3

Training throughput not scale with increasing number of devices (ViT training)

I have the similar experiments but from NLP. I have run the opt example with 1.3b gpt-neo. with one node 8 2080ti gpus, it is Throughput: 3664.00 token/s, 3.74 TFLOP/s....

[Checkpoint: AIR] Saved checkpoints folders does not include correct training iteration number.

There is the same issue for HuggingfaceTrainer, when using steps for saving frequency, like 1000 steps, the first checkpoint is checkpoint 00000, not checkpoint1000.

[Checkpoint: AIR] Saved checkpoints folders does not include correct training iteration number.

> How is this impacting workloads, aside from the Keras callback not saving the epoch? As far as I understand, the most important thing is that we have an incremental...

[Checkpoint: AIR] Saved checkpoints folders does not include correct training iteration number.

I haven't set checkpoint_frequency in CheckpointConfig

Unable to reproduce Langboat/ReGPT-125M-200G‘s PPL result.

Thanks for your reply. I will try again.

Mamba for encoder-decoder

how about check this one https://goombalab.github.io/blog/2024/hydra-part2-model/

How to calculate Flops for mamba_chunk_scan_combined and mamba_split_conv1d_scan_combined function

according to mamba-2 paper section 6.3, the following function is my understanding ```python def ssd_flops(T,Q,P,N): # center blocks #print(T,Q,P,N) center_blocks_sma_compute = T*Q*N+T*Q*Q+T*P*N #print("center_blocks_sma_compute",center_blocks_sma_compute/1e9,T*Q*N/1e9,T*Q*Q/1e9,T*P*N/1e9) #low-rank blocks right factors b terms b_compute...

On the small model, the actual GPU memory usage of Mamba2 is much higher than that of Mamba1.

> > > > huh there's no requirement d_state / head_dim % 8 == 0 there's d_model / head_dim % 8 == 0 you can try the dimensions similar to...