The image size issue for HRF dataset
> What does 'before', 'divide' and 'fusion' mean in the code? I couldn't find any corresponding description in your paper.
In the JSON file, does 'image_size' refer to before or after super-resolution? If 'image_size = (512, 512) ' and 'upscale_rate = 4', does it mean that no matter what the resolution of the input image is, it's resized to 128x128 before entering the encoder, and the outputs from both decoder-seg and decoder-sr are 512x512?
Looking forward to your reply.
Thank you for your attention. The super_resolution means to allow the model to use the super-resolution. The before means using the shared feature extraction module before the task head, and the sr_seg_fusion is used to control the w/ or w/o of the shared feature extraction module. The image_size and upscale_rate depend on the code you used; if you use our code, you are right.
Originally posted by @Qsingle in https://github.com/Qsingle/imed_vision/issues/3#issuecomment-2270543953
Hi, thanks for your sharing, i am reproducing the model from the paper, but unfortunately the results are not good.
I hope I can get your help.
In the experimental of training the D2SL model on the HRF dataset, i got some questions.
-
Is the training config setting right? Are the bolding settings correct? { "init_lr" : 0.01, "momentum" : 0.9, "weight_decay": 1e-4, "image_dir" : "dataset/HRF/images", "image_suffix": ".jpg", "mask_dir": "dataset/HRF/manual1", "mask_suffix": ".tif", "epochs": 300, "lr_sche": "poly", "image_size": [1752,1168], "super_reso": true, "fusion": true, "num_classes": 2, "divide": true, "gpu_index": "0", "model_name": "supervessel", "block_name": "origin", "channel": 3, "upscale_rate": 2, "num_workers": 0, "ckpt_dir": "./ckpt", "distance": false, "dataset": "hrf", "batch_size": 2, "crop": true, "before": before }
-
I set the "image_size" in the config file as [1752,1168], so the network's output is [1752, 1162]. But during evaluation, the true size of HRF testset image is [3504, 2336], the question is: what did you do to ensure that the output image was the same size as the real image to evaluate the metrics, and was the test image downsampled in advance?
Thank you very much. This is one limitation of current research work: We can not do any size super-resolution. So we set the output size to one fixed size, e.g., $2048\times2048$, and then we interpolate it to the original size by nearest sampling. You can also try to make it output the original size by downsampling the input image $n$ times, and $n$ is the upscale (upscale_rate in the JSON file) times used in training. However, I am not sure this can work well.
Thanks very much for your reply.