Mario Lezcano Casado
Mario Lezcano Casado
@ZainRizvi there are some tests in this PR that are taking forever to run (9h+). Do you know what's going on?
@pytorchbot merge -f "windows tests are not running"
@pytorchbot merge -f "windows tests are not running"
@pytorchbot merge
CI fails for test_compare_cpu_nn_functional_embedding_cuda_float32 which is not reproducible locally
This can be either a bug or that the function is non-deterministic for some inputs (or both?). @kurtamohler could you have a look?
CI fails for test_compare_cpu_nn_functional_embedding_cuda_float32 which is not reproducible locally
So, I think the issue is in the CUDA version. Rather than using `scalar_t`, it should be using `opmath_t` and the division should be performed in that type for extra...
CI fails for test_compare_cpu_nn_functional_embedding_cuda_float32 which is not reproducible locally
Ah right. Probably those intermediary values should be kept in the computation type on CPU? What do you think @ngimel?
CI fails for test_compare_cpu_nn_functional_embedding_cuda_float32 which is not reproducible locally
ping @ngimel
This PR should be updated once https://github.com/pytorch/pytorch/pull/85248 lands adding similar error inputs..
> I still don't fully understand how nll_loss function is composed. Why does this function have many variations in multiple files? Yes it does. ITs implementation is an absolute mess...