MitoZ
MitoZ copied to clipboard
Cnidaria
Hi Lin,
In the profile folder under MT_database "Cnidaria_CDS_protein.fa" is present. However, the corresponding CDS_HMM information and option for Cnidaria in "--clade" are missing.
Hey,
Thanks for pointing out the problem!
I will add the information in the next release.
The HMM files are used to screen out candidate mitochondrial sequences initially, and the Cnidaria_CDS_protein.fa
file is used to annotate the final mitochondrial genome. The HMM models are generally very robust, which means that an HMM model from other clades should also work for your target clade.
One way to work around now is:
- Create a custom profile directory: See https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database
$ mkdir ~/mitoz_custom_db
$ cp -a /home/guanliang/soft/miniconda3/envs/mitozEnv/lib/python3.7/site-packages/mitoz/profiles ~/mitoz_custom_db
$ ls -lhrt ~/mitoz_custom_db/profiles/
total 16K
-rw-rw-r-- 1 guanliang guanliang 0 May 12 06:47 __init__.py
drwxrwxr-x 2 guanliang guanliang 4.0K May 24 16:06 CDS_HMM
drwxrwxr-x 2 guanliang guanliang 4.0K May 24 16:06 rRNA_CM
drwxrwxr-x 2 guanliang guanliang 4.0K May 24 16:06 __pycache__
drwxrwxr-x 2 guanliang guanliang 4.0K May 24 17:36 MT_database
Now,
$ cd ~/mitoz_custom_db/profiles/MT_database
# rename the file
$ mv Arthropoda_CDS_protein.fa bak.Arthropoda_CDS_protein.fa
# create a soft-link (a "faked" Arthropoda_CDS_protein.fa file)
$ ln -s Cnidaria_CDS_protein.fa Arthropoda_CDS_protein.fa
- Use the following command when you run MitoZ:
--profiles_dir ~/mitoz_custom_db/profiles --clade Arthropoda --genetic_code 4
# If the mitochondrial genetic code of your target group is 4.
By using --clade Arthropoda
, MitoZ will use the CDS_HMM/Arthropoda_CDS.hmm
file for candidate mitochondrial sequence searching.
And because we have linked the Cnidaria_CDS_protein.fa
file as Arthropoda_CDS_protein.fa
, MitoZ will actually use the Cnidaria_CDS_protein.fa
file for protein annotation.
If the user's target clade is another group, you can do similar things to make MitoZ work.
- If necessary, add more homologous proteins to the
Cnidaria_CDS_protein.fa
file, especially when some PCGs are missing from the annotation result.
Please refer to https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database#3-but-what-protein-sequences-are-to-be-used
Tips:
If you do not want to link Cnidaria_CDS_protein.fa
as Arthropoda_CDS_protein.fa
, you have two options:
- just keep using the original
Arthropoda_CDS_protein.fa
file And then, run MitoZ with the following parameters:
--clade Arthropoda --genetic_code 4
# but you need to choose the correct genetic code here
MitoZ will simply use the Arthropoda_CDS_protein.fa
file for protein annotation. If your target clade or gene is too distant from arthropods, some proteins may be missing in the annotation result.
- add the protein sequences of the 13 protein genes of your target clade to this
Arthropoda_CDS_protein.fa
file (Please refer to https://github.com/linzhi2013/MitoZ/wiki/Extending-MitoZ's-database#3-but-what-protein-sequences-are-to-be-used)
$ mkdir ~/mitoz_custom_db
$ cp -a /home/guanliang/soft/miniconda3/envs/mitozEnv/lib/python3.7/site-packages/mitoz/profiles ~/mitoz_custom_db
$ cd ~/mitoz_custom_db/profiles/MT_database/
# edit the '~/mitoz_custom_db/profiles/MT_database/Arthropoda_CDS_protein.fa' file with a text editor, like "vim" or the Sublime Text program.
And then, run MitoZ with the following parameters:
--profiles_dir ~/mitoz_custom_db/profiles --clade Arthropoda --genetic_code 4