Tian Lan

Results 12 comments of Tian Lan

@younesbelkada I believe transformer does not properly clear the cache after each training step, after your suggestion, I did the empty cache and gc collection, compared to the previous stepwise...