STTN Out of CUDA memory when training

On a single RTX 3090. Is there a param I can adjust to make this work?

Thanks

Jun 28 '23 11:06 antithing

I am also trying to train a model at 1920x1080, and when I change the batch size and resolution:

{
    "seed": 2020,
    "save_dir": "release_model/",
    "data_loader": {
        "name": "davis",
        "data_root": "datasets/",
        "w": 1920,
        "h": 1080,
        "sample_length": 10
    },
    "losses": {
        "hole_weight": 1,
        "valid_weight": 1,
        "adversarial_weight": 0.01,
        "GAN_LOSS": "hinge"
    },
    "trainer": {
        "type": "Adam",
        "beta1": 0,
        "beta2": 0.99,
        "lr": 1e-4,
        "d2glr": 1, 
        "batch_size": 4,
        "num_workers": 1,
        "verbosity": 2,
        "log_step": 100,
        "save_freq": 1e4,
        "valid_freq": 1e4, 
        "iterations": 50e4,
        "niter": 30e4,
        "niter_steady": 30e4
    }
}

I see this error:

inpainting\STTN-master\STTN-master\model\sttn.py", line 188, in forward
mm = m.view(b, t, 1, out_h, height, out_w, width)
RuntimeError: shape '[4, 10, 1, 60, 60, 108, 108]' is invalid for input of size 259200

Jun 28 '23 16:06 antithing

Try a lower batch size. Set it to 1 and see what happens. Also would it be possible to run the training at half or a quarter of your resolution and then upscale? Transformers are notorious for scaling quadratically with regard to their input so HD input size with temporal attention is perhaps unlikely to fit in 24GB vRAM (in my opinion, I could be wrong).

Jul 13 '23 08:07 alex-flwls

I got this running at HD resolution (1920x1080) on an Nvidia A10-G (24GB vRAM). This is for inference, I haven't tried training a new model yet.

Here's the patch sizes I used (in model/sttn.py): patchsize = [(480,270), (160, 90), (32, 18), (16, 9)]

And here's the hyperparameters from test.py: w, h = 1920, 1080 ref_length = 10 neighbor_stride = 3 default_fps = 24

The results aren't great though. I think this is possibly because limiting the number of neighbour and reference frames will inhibit the ability of the model to infer inpainted regions. Also changing the patch sizes from training is probably not helping.

Jul 19 '23 12:07 alex-flwls