Kim Yann comments

Results 30 comments of


                                            Kim  Yann

[intel_gpu] mem leak when runing RN50

进一步发现爆显存可能发生在 eval 阶段， https://github.com/PaddlePaddle/PaddleClas/blob/f820473d1d4d5174e57a5a6b08a42f672eb13390/ppcls/configs/ImageNet/ResNet/ResNet50.yaml#L8 `eval_during_train: False` 看到不到相关 oom

[intel_gpu] mem leak when runing RN50

@qili93 Intel gpu support for paddle is paused due to some policy and market change, I guess we can't be sure before next gen Falcon Shore GPU

[TODO] 收集框架对于CustomDevice待优化项目

主框架是使用的 flat_hash_map 中使用的 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/utils/flat_hash_map.h#L743 deallocate_data 在 LLVM 中是非法的

> 看起来是每次train完一个epoch之后进行eval时，就会涨一波内存，并且没有释放。我这边也遇到了同样的问题https://github.com/PaddlePaddle/PaddleCustomDevice/issues/670 可以 https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/intel_gpu/runtime/runtime.cc#L250 一样加一个 VLOG(); 抓一下 CustomDevice 的 Allocate /Deallcate 的调用

Flux.1

lazy mode w/o graph ```bash python text_to_image_generation.py \ --model_name_or_path black-forest-labs/FLUX.1-schnell \ --prompts "A cat holding a sign that says hello world" \ --num_images_per_prompt 1 \ --batch_size 1 \ --num_inference_steps 28...

Flux.1

graph mode: ```bash python text_to_image_generation.py \ --model_name_or_path black-forest-labs/FLUX.1-schnell \ --prompts "A cat holding a sign that says hello world" \ --num_images_per_prompt 1 \ --batch_size 1 \ --num_inference_steps 28 \ --image_save_dir...

Flux.1

eager: ```bash PT_HPU_LAZY_MODE=0 \ python text_to_image_generation.py \ --model_name_or_path black-forest-labs/FLUX.1-schnell \ --prompts "A cat holding a sign that says hello world" \ --num_images_per_prompt 1 \ --batch_size 1 \ --num_inference_steps 28 \...

Flux.1

Performance With Batching Enabled: |Device|Mode |Prompts|Image Per Prompts|BS|Steps|FPS | |------|-----|-------|-----------------|--|-----|-----| |G2H |Graph|1 |4 |4 |28 |0.113| |G2H |Graph|5 |1 |5 |28 |0.113|

Flux.1

> please delete measure_all_500, measure_all etc. binary files like npz needn't be uploaded @ssarkar2 removed, pls review again

Kim Yann

[intel_gpu] mem leak when runing RN50

[intel_gpu] mem leak when runing RN50

[TODO] 收集框架对于CustomDevice待优化项目

长期训练host端内存耗尽

长期训练host端内存耗尽

Flux.1

Flux.1

Flux.1

Flux.1

Flux.1