MagicSource
MagicSource
@janikstfub this augmentation is used for resize along shortest with original image, if you skip this step, your images is using raw as input, it should works well except your...
I am currently using deepspeed zero3. I have a model which need at least 40GB GPU mem all. But I only got 32GB, using deepseed zero3 it might can reduce...
I think it might due the forward, as am able to train all at, Am suing zero3 config like this: ``` { "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000,...
So that, it can be concluded, if one can not use zero3 trainng a model even with bs = 1, then it won't able to do so with FSDP as...
Yes, how to enable tensor parallelism, seems I need split the model into 2 GPUs, and calculate for both a single batch data. This looks like didn't have default settings...
HI, does there any built-in implementation to scale TP with a single config in transformers? Looks like users need to config every single layer to use TP?
Looks like torchtitan able to do tensor parellel by default?
Does it support Qwen model? Also, does multimodal model can be suppported such as LLava etc?
Where to add it?