Andrea Lekkas
Andrea Lekkas
The idea by @naarkhoo can work in some cases: rounding the features (i.e. the row(s) from the original data that get passed to the shap.plots.force(...) function) did it for me;...
@rewicks good call, the memory usage increases only with the ASGD optimizer. I think I have found the problem with it, but I am not sure how to solve it....
I have found a solution. If it works for others as well, this issue can be closed. I have modified the ASGD optimizer using @mourga's port of AWD-LSTM for PyTorch...
For the sake of anybody else coming across this issue: I have now noticed that [Issue 49](https://github.com/kimiyoung/transformer-xl/issues/49) has a similar question. The answer includes a script that uses the pretrained...
Hi, any answer by the authors is going to be more accurate, but since I have looked up Adaptive Softmax I can comment on this and possibly help. `cutoffs` is...