DISN icon indicating copy to clipboard operation
DISN copied to clipboard

Training from Scratch

Open Mahsa13473 opened this issue 5 years ago • 13 comments

Hi there,

Thanks for releasing the codes, it is amazing work! I try to train the network from scratch and follow all the steps that were mentioned in Readme file, but I couldn't get the same results in comparison to pretrained model.

I was wondering which hyperparameters are used for the pretarined one. Is it the same as the defaults in train_sdf.py? How many epochs did you train to get the best accuracy? Also which dataset was used for training? The old one or the new one that you mentioned in Readme?

Mahsa13473 avatar Oct 05 '19 01:10 Mahsa13473

Hello, in addition to @Mahsa13473 's comment, can you also provide the approximate training time?

no-materials avatar Oct 08 '19 11:10 no-materials

hi by the time we submitted, we used the old one which everyone else used as well. We used imagenet pretrained vgg16(provided by official tensorflow release), as shown in the command in readme. We haven't tried training everything from scratch yet since i guess the dataset itself is not big enough to understand 2d image perfectly.

Xharlie avatar Oct 15 '19 03:10 Xharlie

the training time can vary from 1 day to 3 days depends on your gpu. but i ll say at most 3 days. The bottleneck is on cpu since we have to read sdf ground truth and image h5 file on the fly. so if you have a better cpu or ssd for sdf/img storage, you can train them faster.

Xharlie avatar Oct 15 '19 03:10 Xharlie

Hi, I'm also training the network from scratch using the pre-trained vgg16, but I can't get the same result. Did you used the pre-trained vgg16? @Mahsa13473

asurada404 avatar May 19 '20 12:05 asurada404

Hi. Yes, but I couldn't get the same result with the pretrained vgg16. But I tried a few months ago. Not sure how it works with the updated version of code. @asurada404

Mahsa13473 avatar May 19 '20 20:05 Mahsa13473

Hello, anyone knows where is the pretrained modelvgg_16.ckpt? python -u train/train_sdf.py --gpu 0 --img_feat_twostream --restore_modelcnn ./models/CNN/pretrained_model/vgg_16.ckpt --log_dir checkpoint/SDF_JG --category all --num_sample_points 2048 --batch_size 20 --learning_rate 0.0001 --cat_limit 36000 gets an error: tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./models/CNN/pretrained_model/vgg_16.ckpt

JohnG0024 avatar Aug 29 '20 14:08 JohnG0024

Download vgg_16.ckpt and save to ./models/CNN/pretrained_model first. @JohnG0024

asurada404 avatar Aug 29 '20 15:08 asurada404

@asurada404 Thanks!

JohnG0024 avatar Aug 29 '20 15:08 JohnG0024

@Xharlie In your opinion, what's missing in the dataset that makes it unable to understand 2d image perfectly?

JohnG0024 avatar Aug 29 '20 15:08 JohnG0024

The VGG was used as an encoder to extract the features of the image. The pre-trained VGG was training on ImageNet dataset(more than 14 million images and more than 20,000 categories) which is much larger than ShapeNet. As a result, the VGG trained on ImageNet can extract image features better than the VGG trained on ShapeNet. @JohnG0024

asurada404 avatar Aug 30 '20 01:08 asurada404

@asurada404 That makes sense. So the vgg_16.ckpt is from the full Imagenet dataset, not the 1k subset of categories of ImageNet used in the ImageNet Challenge?

JohnG0024 avatar Aug 30 '20 10:08 JohnG0024

You can find more details in this paper @JohnG0024

asurada404 avatar Aug 31 '20 08:08 asurada404

Does anyone successfully reproduce the results?

I trained the network with ground truth camera parameters. No modifications have done to the code. nohup python -u train/train_sdf.py --gpu 0 --img_feat_twostream --restore_modelcnn ./models/CNN/pretrained_model/vgg_16.ckpt --log_dir checkpoint/{your training checkpoint dir} --category all --num_sample_points 2048 --batch_size 20 --learning_rate 0.0001 --cat_limit 36000 &> log/DISN_train_all.log &

The train/test split is 3D-R2N2. I trained for about 3 days, approxiamtely 23 epochs. The sdf loss stopped dropping so I assumed the network converged. But I only got bad visuals in test set models.

AlexsaseXie avatar Jan 06 '21 13:01 AlexsaseXie