Wei Shen

Results 235 comments of Wei Shen

> So for each read i use LCA to assign the upper common hit from the 10 hits, so for one read i get one hit. Good, here you've assigned...

``` $ echo 883645126 1202746109 883645126 1956460793 883645126 1956460793 883645126 485431882 \ | sed -E 's/\s+/\n/g' \ | taxonkit lineage --data-dir taxdump/ -t \ | cut -f 3 609216830;1494978361;1845768359;813944714;671290804;690796498;883645126 609216830;1494978361;1845768359;813944714;95949142;368069282;1202746109...

Someone had the same need before, and I write these steps and will add them to the doc of TaxonKit. ### Merging GTDB and NCBI taxonomy Sometimes ([1](https://github.com/shenwei356/gtdb-taxdump/issues/6)) one needs...

> I did not completely describe my question. We also like to integrate human, plasmid, and UniVec_Core sequences. Since Bacteria and Achaea and Virus example above was cut at Domain(D)...

Sure. [This tutorial](https://bioinf.shenwei.me/taxonkit/tutorial/#merging-gtdb-and-ncbi-taxonomy) could be a reference. Steps: 1. Exporting taxonomic lineages of taxa with rank equal to species from [GTDB-taxdump](https://github.com/shenwei356/gtdb-taxdump), into tabular format. taxonkit list --data-dir gtdb-taxdump/R207/ --ids 1...

So how was the old one generated?

I see, the taxdump files in `struo2` should be generated by @nick-youngblut with old versions of TaxonKit, which might produce different TaxId values for the same lineage. I'm not sure...

Yes, it's "removed" during taxdump file creation. There are some doc in the help message: ``` $ taxonkit create-taxdump -h Attentions: 1. Names should be distinct in taxa of different...

I understand your worries. In practice, we only summarize at rank phylum and species. Besides, for predictions with an abundance lower to 0.0002, which probably are false positives. You can...

> 1 : In the r220 GTDB taxonomy files, are you using full taxonomy as you changed in taxonkit 0.16.2(by allowing duplicated names in different rank)? Yes. https://github.com/shenwei356/taxonkit/issues/92#issuecomment-1979758849 > 2...