Jongsoo Park issues

Results 10 issues of


Jongsoo Park

Warnings in snmf.py and performance

I'm seeing warning ``` nimfa/methods/factorization/snmf.py:610: RuntimeWarning: invalid value encountered in power np.mat(2 ** np.array(list(range(l_var - 1, -1, -1)))), p_set) ``` I see this happens when l_var is 64 exceeding int64...

help wanted

multi-dimensional split and concat

Summary: Multi-dimensional version of split and concat Differential Revision: D37266285

fb-exported

cla signed

template specialization to recover batched_csr2csc perf for unweighted

Summary: As title Reviewed By: jianyuh Differential Revision: D36618330

fb-exported

cla signed

clean up walk_down_tensor_storage_tree_

Summary: walk_down_tensor_storage_tree_ returns a pair instead of having a reference argument that used as both input/output that can be confusing. Differential Revision: D35468676

fb-exported

cla signed

Remove dead includes in deeplearning/fbgemm/fbgemm_gpu

Reviewed By: jspark1105 Differential Revision: D33578876

fb-exported

cla signed

add float output option for conv

Summary: For dynamic quantization * Match ReQuantizeForFloat interface with ReQuantizeOutput so we can use them in the same function * Created requantization functions that output floats for various cases like...

fb-exported

cla signed

add avg reuse logging

Differential Revision: D31215055

fb-exported

cla signed

support jagged tensor with no inner dense dim

Summary: jagged_to_padded_dense supports jagged tensor with inner dense dim is 1 and folded like the following example, but producing the error ``x_offsets.size(), 1 != num_jagged_dim, 0`` in backward. x_values =...

fb-exported

cla signed

bug fix for CPU pruned weighted TBE

Summary: When we see a pruned row we also need to skip the corresponding weight. D36461772 fixed EmbeddingSpMDMNBit.cc but didn't EmbeddingSpMDM.cc Added unit tests for both 8bit and Nbit cases....

fb-exported

cla signed

Need all-reduce for norm weight gradients with sequence parallel

https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/module/layernorm_linear.py#L461-L471 When we use sequence parallel we need all-reduce norm weight gradients after the code above among TP groups?