chanjo
chanjo copied to clipboard
joined gene names, a possible pitfall to cause incorrect result?
Is chanjo
aware of this problematic gene names, which may causes various problems for queries that base on gene names?
➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|head
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|wc -l
66188
➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|sort|uniq|wc -l
9290
➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|sort|uniq >problematic.gene.names.txt