Shao Tang
Shao Tang
``` if (curDistFromStart > distTo[curNodeID]) { // 已经有一条更短的路径到达 curNode 节点了 continue; } ```
> In lines like ` const size_t N = (size_t)(B) * T * V;` is the explicit cast needed? ``` int B = 64; int T = 1024; int V...
> Sorry I meant the casts look ugly to my eye. Maybe we could make the individual params `size_t` in the function declarations 🤔, so their products will come out...
> we can't just malloc on repeat, without free. maybe memset to zero if needed? @karpathy good point... Always forget we are dealing with raw ptr instead of smart ptr...
fixed in https://github.com/pytorch/examples/pull/1214 via ` push_back(Functional( [](torch::Tensor input) { return torch::log_softmax(input, 1); }))`
> Thanks a lot! > > re the mask discrepancy mentioned on the initial issue > > > Also, a clarification on why one uses a boolean mask for _key_padding_mask...
@pytorchbot label "topic: docs"
@pytorchbot label "release notes: nn"
@pytorchbot merge
> What is the benefit of the online softmax for us? - It reduces the 3 for loops (1 compute max, 2 compute sum, 3 compute output) to 2 for...