UniversalFakeDetect icon indicating copy to clipboard operation
UniversalFakeDetect copied to clipboard

about training device

Open long280 opened this issue 1 year ago • 5 comments
trafficstars

Hello, thank you very much for your work, I would like to ask about the equipment you use when training your code. And why is the fc_weight file given in your paper only 4kb, but the model file I trained is 1~2GB?

long280 avatar Mar 21 '24 10:03 long280

They may uploaded a git-lfs pointer file instead of a full pretrain model... And on Tesla P40 I need 6~7s for a single iteration that caused about ~20 min to see the first loss output. Also, in default hyper parameters it takes non-less than 13.5G VRAM to train on GenImage SD-V1.4 dataset with bs256. For 1 epoch, ~1.5hrs needed.

255doesnotexist avatar Aug 12 '24 01:08 255doesnotexist

They may uploaded a git-lfs pointer file instead of a full pretrain model... And on Tesla P40 I need 6~7s for a single iteration that caused about ~20 min to see the first loss output. Also, in default hyper parameters it takes non-less than 13.5G VRAM to train on GenImage SD-V1.4 dataset with bs256. For 1 epoch, ~1.5hrs needed.

UPD: but validation result are not so good. idk if i get anything wrong in training process? i peformed training by adress fake/real_list_path manually in command args and just let it run and save the early stop checkpoints. and it just give me very low ap score on genimage valset.

255doesnotexist avatar Aug 19 '24 15:08 255doesnotexist

They may uploaded a git-lfs pointer file instead of a full pretrain model... And on Tesla P40 I need 6~7s for a single iteration that caused about ~20 min to see the first loss output. Also, in default hyper parameters it takes non-less than 13.5G VRAM to train on GenImage SD-V1.4 dataset with bs256. For 1 epoch, ~1.5hrs needed.

UPD: but validation result are not so good. idk if i get anything wrong in training process? i peformed training by adress fake/real_list_path manually in command args and just let it run and save the early stop checkpoints. and it just give me very low ap score on genimage valset.

I would like to ask, why does it take two hours to run one epoch? Is this normal? How did you perform training by addressing fake/real_list_path manually in the command args?

oceanzhf avatar Sep 23 '24 03:09 oceanzhf

They may uploaded a git-lfs pointer file instead of a full pretrain model... And on Tesla P40 I need 6~7s for a single iteration that caused about ~20 min to see the first loss output. Also, in default hyper parameters it takes non-less than 13.5G VRAM to train on GenImage SD-V1.4 dataset with bs256. For 1 epoch, ~1.5hrs needed.

UPD: but validation result are not so good. idk if i get anything wrong in training process? i peformed training by adress fake/real_list_path manually in command args and just let it run and save the early stop checkpoints. and it just give me very low ap score on genimage valset.

I would like to ask, why does it take two hours to run one epoch? Is this normal?

IDK. But it may be normal because no runtime exception was thrown in training process.

How did you perform training by addressing fake/real_list_path manually in the command args?

You should modify data/datasets.py, add a data mode like 'manually'.

elif opt.data_mode == 'manually':
    real_list = get_list( os.path.join(opt.real_list_path) )
    fake_list = get_list( os.path.join(opt.fake_list_path) )

Because there is no .pickle in this path so it just triggered recursive searching in your image dataset path.

Address your real and fake image path and it works soon.

255doesnotexist avatar Sep 23 '24 07:09 255doesnotexist

They may uploaded a git-lfs pointer file instead of a full pretrain model... And on Tesla P40 I need 6~7s for a single iteration that caused about ~20 min to see the first loss output. Also, in default hyper parameters it takes non-less than 13.5G VRAM to train on GenImage SD-V1.4 dataset with bs256. For 1 epoch, ~1.5hrs needed.

UPD: but validation result are not so good. idk if i get anything wrong in training process? i peformed training by adress fake/real_list_path manually in command args and just let it run and save the early stop checkpoints. and it just give me very low ap score on genimage valset.

I would like to ask, why does it take two hours to run one epoch? Is this normal?

IDK. But it may be normal because no runtime exception was thrown in training process.

How did you perform training by addressing fake/real_list_path manually in the command args?

You should modify data/datasets.py, add a data mode like 'manually'.

elif opt.data_mode == 'manually':
    real_list = get_list( os.path.join(opt.real_list_path) )
    fake_list = get_list( os.path.join(opt.fake_list_path) )

Because there is no .pickle in this path so it just triggered recursive searching in your image dataset path.

Address your real and fake image path and it works soon.

Thank you very much for your answer. When you validate with your trained .pth file, do you encounter this error: RuntimeError: Error(s) in loading state_dict for Linear: Missing key(s) in state_dict: 'weight', 'bias'. Unexpected key(s) in state_dict: 'model', 'optimizer', 'total_steps'?

oceanzhf avatar Sep 25 '24 02:09 oceanzhf