Results 14 comments of Shao Tang

``` if (curDistFromStart > distTo[curNodeID]) { // 已经有一条更短的路径到达 curNode 节点了 continue; } ```

> In lines like ` const size_t N = (size_t)(B) * T * V;` is the explicit cast needed? ``` int B = 64; int T = 1024; int V...

> Sorry I meant the casts look ugly to my eye. Maybe we could make the individual params `size_t` in the function declarations 🤔, so their products will come out...

> we can't just malloc on repeat, without free. maybe memset to zero if needed? @karpathy good point... Always forget we are dealing with raw ptr instead of smart ptr...

fixed in https://github.com/pytorch/examples/pull/1214 via ` push_back(Functional( [](torch::Tensor input) { return torch::log_softmax(input, 1); }))`

> Thanks a lot! > > re the mask discrepancy mentioned on the initial issue > > > Also, a clarification on why one uses a boolean mask for _key_padding_mask...

@pytorchbot label "topic: docs"

@pytorchbot label "release notes: nn"

> What is the benefit of the online softmax for us? - It reduces the 3 for loops (1 compute max, 2 compute sum, 3 compute output) to 2 for...