CAMISIM icon indicating copy to clipboard operation
CAMISIM copied to clipboard

How to map relevant genomes to OTU through scientific names?

Open fanqiedantang opened this issue 1 year ago • 11 comments

command:

python metagenome_from_profile.py -p small_test/test.biom -c small_test/config_test.ini --ncbi ncbi/ -f --debug -ref small_test/ref.tsv -tmp small_test/tmp -o small_test/out/

warning: image image

the "ref.tsv" : image The scientific name in the ref.tsv file is obtained by searching the taxonomy of each OTU on NCBI. But it seems that they cannot map OTUs to the correct reference genome through them.

fanqiedantang avatar Jan 11 '24 08:01 fanqiedantang

The mapping with scientific names currently only works with the online version of NCBI, the additional reference file is not used for the scientific name mapping. If you already know which genomes you want to use/which genomes map to which in your BIOM file, it is probably better to use the de novo simulation of CAMISIM and just set the abundances given in the profile - also see the last few answers of this issue for more details on how to do this.

AlphaSquad avatar Jan 11 '24 09:01 AlphaSquad

Thank you very much, I understand what you mean now. By the way, if i don't know which genomes map to my BIOM file, how can I solve the following warning: image image image

fanqiedantang avatar Jan 11 '24 10:01 fanqiedantang

These warnings should not stop CAMISIM from running, they just show you that the NCBI mapping failed and CAMISIM uses the additional references to fill up your data set - which means that the data set might be less similar to your BIOM profile then desired. Another way to get a little bit more accuracy is to edit the file scripts/get_genomes.py on line 47 and set MAX_RANK to e.g. order or class (this seems to be the level where your genomes have "real" scientific names) - but that means CAMISIM might only find genomes of the same order/class as your BIOM profile genomes - which is quite the difference. Now, if you know the mapping for some genomes and do not for others, you can run CAMISIM from profile with the community_only option. This will yield a mapping file for all genomes, you can then replace this mapping for the genomes you do know the mapping and use CAMISIMs mapping for the others and use this as input for a de novo run. I hope this helps.

AlphaSquad avatar Jan 11 '24 10:01 AlphaSquad

Thank you!

fanqiedantang avatar Jan 11 '24 10:01 fanqiedantang

it seems that there are some new errors when i try the de_novo simulation command:python metagenomesimulation.py MP_simulation/config.ini --debug. the error is as follows: image

fanqiedantang avatar Jan 18 '24 02:01 fanqiedantang

This error is most likely to occur when there is a previous/unfinished CAMISIM run in your out directory. Could you make sure that the out directory is empty and try again?

AlphaSquad avatar Jan 18 '24 07:01 AlphaSquad

Unfortunately, I have encountered this problem again. image

fanqiedantang avatar Jan 18 '24 08:01 fanqiedantang

Hi, i noticed the same problem in this issue. I tried simulating 1GB, 5GB, 20GB, and 50GB, but all reported the same error. I don't think this problem was posed by the deep/GB per sample .

fanqiedantang avatar Jan 19 '24 06:01 fanqiedantang

Since this error occurs during anonymising, could you try running CAMISIM without anonymising and see if you still encounter errors?

AlphaSquad avatar Jan 19 '24 07:01 AlphaSquad

how to run CAMISIM without anonymising?

fanqiedantang avatar Jan 19 '24 07:01 fanqiedantang

Strangely, when I chose to simulate 10 species from all, there was no error.

fanqiedantang avatar Jan 19 '24 07:01 fanqiedantang