chanjo icon indicating copy to clipboard operation
chanjo copied to clipboard

joined gene names, a possible pitfall to cause incorrect result?

Open biocyberman opened this issue 8 years ago • 2 comments

Is chanjo aware of this problematic gene names, which may causes various problems for queries that base on gene names?

➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|head                                                                                                                                                                                 
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
NOX1,NOX1,NOX1
➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|wc -l                                                                                                                                                                                
66188

➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|sort|uniq|wc -l                                                                                                                                                                      
9290
➤ gawk '{print $NF}' ccds.15.grch37p13.extended.bed|grep ','|sort|uniq >problematic.gene.names.txt 

biocyberman avatar Mar 13 '16 20:03 biocyberman