Jongsoo Park
Jongsoo Park
I'm seeing warning ``` nimfa/methods/factorization/snmf.py:610: RuntimeWarning: invalid value encountered in power np.mat(2 ** np.array(list(range(l_var - 1, -1, -1)))), p_set) ``` I see this happens when l_var is 64 exceeding int64...
Summary: Multi-dimensional version of split and concat Differential Revision: D37266285
Summary: As title Reviewed By: jianyuh Differential Revision: D36618330
Summary: walk_down_tensor_storage_tree_ returns a pair instead of having a reference argument that used as both input/output that can be confusing. Differential Revision: D35468676
Reviewed By: jspark1105 Differential Revision: D33578876
Summary: For dynamic quantization * Match ReQuantizeForFloat interface with ReQuantizeOutput so we can use them in the same function * Created requantization functions that output floats for various cases like...
Summary: jagged_to_padded_dense supports jagged tensor with inner dense dim is 1 and folded like the following example, but producing the error ``x_offsets.size(), 1 != num_jagged_dim, 0`` in backward. x_values =...
Summary: When we see a pruned row we also need to skip the corresponding weight. D36461772 fixed EmbeddingSpMDMNBit.cc but didn't EmbeddingSpMDM.cc Added unit tests for both 8bit and Nbit cases....
https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/module/layernorm_linear.py#L461-L471 When we use sequence parallel we need all-reduce norm weight gradients after the code above among TP groups?