Lee Bergstrand comments

Results 18 comments of


                                            Lee Bergstrand

kaiju2table -- Meaning of "cannot be assigned to a (non-viral) X"

@pmenzel Unfortunately, `kaiju2krona`output will not work. Though, it does produce a tab-delimited file it does not produce a standard TSV where the same taxa level in each organism's taxa information...

kaiju2table -- Meaning of "cannot be assigned to a (non-viral) X"

@pmenzel Can you write me a brief overview of how `kaiju2table` works? So essentially it reads in the results file, counts the frequency of each NCBI Taxa_ID, then maps these...

kaiju2table -- Meaning of "cannot be assigned to a (non-viral) X"

I'm considering using the `ETE3` toolkit's NCBI library to generate the data I need. I am just wondering about the information in the results file that I could use for...

kaiju2table -- Meaning of "cannot be assigned to a (non-viral) X"

@pmenzel, I haven't tried your implementation yet. One of the problems that I ran into is that the Krona output generates a ragged TSV. In other words, the cells of...

kaiju2table and kaiju2krona produce different numbers

@pmenzel This might be related to https://github.com/bioinformatics-centre/kaiju/issues/181 If your dropping counts of reads that aren't classified down to the rank value then you are going to get different values than...

Exact gene locatons

@jmtsuji My understanding is that the gene catalogue already contains genes clustered at 95% identity. So each gene in the gene catalogue is representative of a cluster of multiple genes...

Exact gene locatons

@SilasK It might be beneficial to assign each gene in the entire pipeline a universally unique identifier (https://en.wikipedia.org/wiki/Universally_unique_identifier) if you are not already doing so, so you could make a...

Exact gene locatons

@jmtsuji This could also be used to map ORF IDs as well.

Exact gene locatons

@SilasK Within the gene2genome.tsv.gz file is the gene column referring to the gene catalogue ID? or some gene id that is unique to that file? gene2genome.tsv.gz Gene MAG Ncopies Gene0000097...

Exact gene locatons

I think we need some file like the following: | gene_catalogue_rep_gene_id | gene_catalogue_gene_cluster | contig_gene_id | contig | genome_gene_id | genome_id | |---|---|---|---|---|---| The file headers above are completely database...