Kaiyu Shi

Results 45 comments of Kaiyu Shi

I'm going to raise a warning for this situation until the multi-layered version is ready.

Well, since the actually PPL of `index GRU` is hard to compute, so the printed loss is simply the NCE loss, which is not comparable with the *CrossEntropy* loss.

Hi, Eric. I failed to reproduce the PPL of 165 on my server, could you plz delete the `data/penn/vocab.pkl` and runs again to see if it happens again. I suspect...

@chaoqing `squeeze(0)` is definitely a better choice, as you said, `squeeze` will remove all the dim with *size=1*, which is unexpected for *N=1*. PR is appreciated. for the non-zero elements...

Yes, I was following the tutorial. I found that using `AT_DISPATCH_FLOATING_TYPES_AND_HALF` MACRO should do the magic to support half scalar type, but it is not documented in the tutorial. Should...

I delete the pre-built dashboards on `grafana`, and the job `export-datasources-and-dashboards` does exit with 0. It seems like the entry script tries to add the dashboard over and over again.

Yes, I meant that. But after using your `Embedding` class, I think it's far simpler than the one from `gensim`. I'm wondering if we can provide a simple way to...

PyTorch caches CUDA memory to prevent repeated memory allocatation cost, you can get more information here: https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management In your case, the reserved bytes should be peak memory usage before `checkpointing`,...

> Q1: Do you know how to explain this: If I keep the same batch-size, but change how I partition the self.features internally (into checkpointed segments), the active_bytes of the...

Chrome on Windows 10 shares the same problem.