metacache icon indicating copy to clipboard operation
metacache copied to clipboard

Using DB partition and MERGE does not match single DB abundance results

Open jaimeortiz-david opened this issue 5 months ago • 0 comments

Hi, I am testing the validity of using smaller databases and then merging the results. However, when I am testing this, the results do not match those of querying one DB. For example, I have a DB with 40 species and created two DBs with 20 species each. When I use the MERGE function, the results of merging the two 20-species DB do not match the abundance results from the full 40-species DB. Here are the commands I am using:

metacache build 20sp_DB1 /test_merge_DB/DB1 -taxonomy ncbi_taxonomy -remove-overpopulated-features

metacache build 20sp_DB2 /test_merge_DB/DB2 -taxonomy ncbi_taxonomy -remove-overpopulated-features

metacache query 20sp_DB1 MixA_1.fastq.gz MixA_2.fastq.gz -pairfiles -tophits -queryids -lowest species -out res1.txt

metacache query 20sp_DB2 MixA_1.fastq.gz MixA_2.fastq.gz -pairfiles -tophits -queryids -lowest species -out res2.txt

metacache merge res1.txt res2.txt -lowest species -taxonomy ncbi_taxonomy -max-cand 4 -hitmin 2 -hitdiff 2 -mapped-only -abundances test_abundance.txt -abundance-per species > out_metacache_merge.txt

jaimeortiz-david avatar Jan 31 '24 02:01 jaimeortiz-david