yet-another-retnet
yet-another-retnet copied to clipboard
Change in how input projections are implemented. seem to converge faster
@draguve Thanks for this!
I'm not seeing a significant change in training convergence. Here are some brief training logs on the Project Gutenberg example:
- red ->
main
withbias=False
- green -> this branch
I wonder if the issue is specific to your application, and possibly just due to differences in initialization.