yet-another-retnet Change in how input projections are implemented. seem to converge faster

Change in how input projections are implemented. seem to converge faster

Open draguve opened this issue 1 year ago • 1 comments

Sep 28 '23 22:09 draguve

@draguve Thanks for this!

I'm not seeing a significant change in training convergence. Here are some brief training logs on the Project Gutenberg example:

I wonder if the issue is specific to your application, and possibly just due to differences in initialization.

Oct 03 '23 20:10 fkodom