julius
julius copied to clipboard
More memory efficient implementation / improvements on julius for > 100K sequence length?
Julius works great, but I notice that the memory usage grows super-linearly with filter length. Are there any settings to change or improvements to be made that could reduce gpu memory consumption?
You can try deactivating the fft based implementation (not sure anymore what is the name of the parameter), especially if you have an input that requires a gradient. It's tuned for speed but not sure I do everything the best way for memory. But then things will be a lot slower. With the regular implem it will be slow but memory usage should be linear.