Lee Bergstrand
Lee Bergstrand
@pmenzel Unfortunately, `kaiju2krona`output will not work. Though, it does produce a tab-delimited file it does not produce a standard TSV where the same taxa level in each organism's taxa information...
@pmenzel Can you write me a brief overview of how `kaiju2table` works? So essentially it reads in the results file, counts the frequency of each NCBI Taxa_ID, then maps these...
I'm considering using the `ETE3` toolkit's NCBI library to generate the data I need. I am just wondering about the information in the results file that I could use for...
@pmenzel, I haven't tried your implementation yet. One of the problems that I ran into is that the Krona output generates a ragged TSV. In other words, the cells of...
@pmenzel This might be related to https://github.com/bioinformatics-centre/kaiju/issues/181 If your dropping counts of reads that aren't classified down to the rank value then you are going to get different values than...
@jmtsuji My understanding is that the gene catalogue already contains genes clustered at 95% identity. So each gene in the gene catalogue is representative of a cluster of multiple genes...
@SilasK It might be beneficial to assign each gene in the entire pipeline a universally unique identifier (https://en.wikipedia.org/wiki/Universally_unique_identifier) if you are not already doing so, so you could make a...
@jmtsuji This could also be used to map ORF IDs as well.
@SilasK Within the gene2genome.tsv.gz file is the gene column referring to the gene catalogue ID? or some gene id that is unique to that file? gene2genome.tsv.gz Gene MAG Ncopies Gene0000097...
I think we need some file like the following: | gene_catalogue_rep_gene_id | gene_catalogue_gene_cluster | contig_gene_id | contig | genome_gene_id | genome_id | |---|---|---|---|---|---| The file headers above are completely database...