Alex J. Champandard comments

Results 60 comments of


                                            Alex J. Champandard

Italian Localisation

All of the issues that are still open have not been done! Contributions still welcome.

Nobara 39 - Kernel 6.8.5-201 - Lens distortion helper is failed to launch

I managed to avoid Steam updates until today, but getting the latest update today also broke this repository with the same issue. The device is basically not usable on Linux...

Weird memory usage for 11m vs 160m: similar batch size fits in memory...

As well as the `seq_len: 256,` changes to the JSON config, here is the `run_train.sh` script I'm using: ``` #!/bin/bash export OMP_NUM_THREADS=1 torchrun --nproc-per-node 2 -m open_lm.main --model open_lm_11m \...

Weird memory usage for 11m vs 160m: similar batch size fits in memory...

I'm pretty sure 3090s do support bf16 but I'll test regular float16! Update, with `--precision fp16` (no AMP): - **160m model** reaches `batch_size: 38` without OOM, 40 fails. - **11m...

Weird memory usage for 11m vs 160m: similar batch size fits in memory...

Using `amp_bf16` instead of `amp_bfloat16` results in the same memory usage. 160m reaches `bs=52`, 11m reaches `bs=56`. So far I don't believe it's related to hardware support, data-type, and I...

Weird memory usage for 11m vs 160m: similar batch size fits in memory...

OK, so right before entering `train_one_epoch()` the memory usage of GPU 0 is proportional to the number of workers + 1. With only one worker and one GPU, this is...

Weird memory usage for 11m vs 160m: similar batch size fits in memory...

UPDATE: Results above look like the cost of doing business with `DistributedDataParallel`, nothing too out of the ordinary? The memory usage at peak in `train_one_epoch` is interesting, just before the...

Weird memory usage for 11m vs 160m: similar batch size fits in memory...

@anas-awadalla Yes, it helps I'm not chasing a white rabbit! 🐇🕳 For `float32`, peak allocated `10.3G` for 11m model before backward(), compared to `9.5G` with `bfloat16`. - The relative memory...

Weird memory usage for 11m vs 160m: similar batch size fits in memory...

Some progress, isolating the problem to a single GPU single worker helps. The problems are there also. 1) Bug maybe? Models with single GPU/workers don't seem to be correctly using...

Weird memory usage for 11m vs 160m: similar batch size fits in memory...

@mitchellnw OK, thanks. The good news is that it's easy to isolate and reproduce! Having removed the autocast and fixed the raw fp16 problem, I made a chart of the...