Jokin Labaien

Results 3 issues of Jokin Labaien

Hi, I was wondering if DiCE works with continuous variables such as time-series data or not.

Are you planning to upload pretrained models with RADIX higher than 2 and with higher cardinality? Thanks in advance.

Hi! I'm trying to use these sparse functions as an alternative to the softmax function in the attention mechanisms of transformers. However, the loss becomes NaN in the first iteration......