relational-rnn-pytorch icon indicating copy to clipboard operation
relational-rnn-pytorch copied to clipboard

About Original Nth Farthest

Open vanzytay opened this issue 7 years ago • 8 comments

Hey!

I've been wondering if you have tried the original Nth Farthest code (from Sonnet) on a 16GB Ram GPU. I keep running into memory errors no matter what I do (on a Volta GPU).

Wondering if you have any clue. (Sorry this is not directly related to your repository), just wondering if you got the original Sonnet version to work.

Thanks!

vanzytay avatar Oct 20 '18 11:10 vanzytay

Hi!

Sadly I have not tried to run the official Sonnet code myself, and just ported the core implementation of RMC with the Sonnet code just for the reference. So I'm afraid i could not share any useful pointers.

The Nth farthest task implementation is from the contributor. (Mind if I ask about this topic, @jessicayung ?)

Side note here, I've been running train_nth_farthest on my TITAN Xp about 5 days now, and checking now, it did break the 25% barrier and reached 91%. Haven't logged anything so I'll try to compare the results with the one form the paper soon.

L0SG avatar Oct 20 '18 13:10 L0SG

Hey! Thanks for the quick reply!

Wow, thanks. I'll use your version in my experiments!

vanzytay avatar Oct 20 '18 15:10 vanzytay

@L0SG Hey there, one more question. I started running your N-Farthest script. It seems to still hover around 0.25 (it's been 1 day). Could you describe if there is a sudden spike in performance (to 91%) or at roughly how many epoch does it take to reach somewhere along that score! And is the default hyperparameters correct for achieving this result? Thanks!

vanzytay avatar Oct 22 '18 05:10 vanzytay

I've fired up the code and let it run forever and actually forgot about it for like 5 days. And checking it after seeing your issue, it was reaching 91% at around ~180000 epochs.

The original paper says a wall clock time of breaking the 25% mark at around 40~50 hours, so running the code for at least this time period is a viable choice I suppose.

Regarding to the default hyperparameters, I've not checked every last details of them yet, but I believe that the contributor took a great effort for matching them as faithful as possible.

Currently I'm doing another project (not related to sequence unfortunately :( ), so I'll double check the faithfulness when I have a spare time.

Meanwhile, if you could find the difference btw the Sonnet and this repo, please let me know and I'll fix it. Thanks!

L0SG avatar Oct 22 '18 16:10 L0SG

@vanzytay The hyperparameters in this implementation were set based on the paper first and the official Sonnet implementation second. Not sure if there were differences between the two. Let me know if you find any problems. I spoke with one of the authors and they did say that the RRNN tends to run for a while before having something like an 'aha' moment and having a spike in performance (as shown in the graphs in the paper).

Also really glad to hear that the implementation's broken the 25% barrier, thanks for running it for longer Sang-gil!

jessicayung avatar Oct 22 '18 16:10 jessicayung

Thanks @L0SG and @jessicayung for your replies!

vanzytay avatar Oct 22 '18 17:10 vanzytay

I've uploaded a bit overdue experimental results of the nth farthest task. Definitely takes way longer than the reported results from the paper. I will play with other hyperparameters when I have spare GPU resources available.

L0SG avatar Nov 13 '18 08:11 L0SG

@L0SG Thanks!

vanzytay avatar Nov 13 '18 08:11 vanzytay