seqc icon indicating copy to clipboard operation
seqc copied to clipboard

Duplicate gene names in sparse count matrix

Open vincent6liu opened this issue 4 years ago • 0 comments

Since some times multiple ENSEMBL IDs correspond to a single gene name, there can be columns with the same gene name in the sparse count matrix (ie. entries in _sparse_counts_genes.csv are not unique). Not sure how this is handled in the filtered dense matrix. Might be good to add some suffix to duplicated gene names matching different ENSEMBL IDs, something like WDFY4 (1), WDFY4 (2).

vincent6liu avatar Dec 30 '19 16:12 vincent6liu