MiFish
MiFish copied to clipboard
This is the command line version of MiFish pipeline. It can also be used with any other eDNA meta-barcoding primers

MiFish
This is the command line version of MiFish pipeline. It can also be used with any other eDNA meta-barcoding primers
References
If you use MiFish Pipeline in your projects, please cite:
- Zhu T, Sato Y, Sado T, Miya M, and Iwasaki W. 2023. MitoFish, MitoAnnotator, and MiFish Pipeline: Updates in ten years. Mol Biol Evol, 40:msad035. https://doi.org/10.1093/molbev/msad035
- Sato Y, Miya M, Fukunaga T, Sado T, Iwasaki W. 2018. MitoFish and MiFish Pipeline: A Mitochondrial Genome Database of Fish with an Analysis Pipeline for Environmental DNA Metabarcoding. Mol Biol Evol 35:1553-1555.
- Iwasaki W, Fukunaga T, Isagozawa R, Yamada K, Maeda Y, Satoh TP, Sado T, Mabuchi K, Takeshima H, Miya M, et al. 2013. MitoFish and MitoAnnotator: a mitochondrial genome database of fish with an accurate and automatic annotation pipeline. Mol Biol Evol 30:2531-2540.
If you use MiFish Primers in your projects, please cite:
- Miya M, Sato Y, Fukunaga T, Sado T, Poulsen JY, Sato K, Minamoto T, Yamamoto S, Yamanaka H, Araki H, et al. 2015. MiFish, a set of universal PCR primers for metabarcoding environmental DNA from fishes: detection of more than 230 subtropical marine species. R Soc Open Sci 2:150088.
Install
Currently we only support Linux. Please use conda to manage the environment. If you do not have a Linux OS, or you just want to have a quick look, you can try the Docker version
External Dependencies
Add these softwares to your system PATH. You can download all the external executable files here(except for MAFFT), or compile by yourself.
- fastp (v0.23.2)
- FLASH (v1.2.7)
- seqkit (v2.3.0)
- vsearch (v2.23.0+)
- NCBI BLAST+ (v2.9.0)
- MAFFT (v7.505)
- Gblocks (v0.91b)
- FastTreeMP (v2.1.11)
Install Steps
conda create -n MiFish python==3.9.13
conda activate MiFish
pip3 install numpy==1.23.1
pip3 install scikit-bio==0.5.6
pip3 install PyQt5==5.15.7
pip3 install ete3==3.1.2
pip3 install duckdb==0.6.1
pip3 install XlsxWriter==3.0.3
pip3 install cutadapt==4.1
pip3 install biopython==1.79
git clone https://github.com/billzt/MiFish.git
cd MiFish
python3 setup.py develop
mifish -h
In Ubuntu, the following library is also needed.
sudo apt-get install -y libgl1
Test
cd test
mifish seq mifishdbv3.83.fa -d seq2
There are six files in the result directory MiFishResult. Note: seq and seq2 are two directories with FQ files.
Parameters
Mandatory
mifish /path/to/your/amplicon/sequencing/directory/ /path/to/your/ref/db.fa
Directory for amplicon sequencing data (FASTQ/FASTA)
Since MiFish supports multi-sample analysis, amplicon sequencing data in compressed FASTQ/FASTA format should be put in directories. Pass the path of the directory as the first parameter. Refer to MiFish's Homepage to see the rules of filenames. Here are some examples:
MiFish-example-02_S73_L001_R1_001.fastq.gzMiFish-example-02_S73_L001_R2_001.fastq.gzDRR126155_1.fastq.bz2DRR126155_2.fastq.bz2mydata.1.fq.xzmydata.2.fq.xz
RefDB of your metabarcoding primers
Prepare your RefDB in FASTA format and index it using the makeblastdb from NCBI BLAST+. RefDB for an old version of MiFish is in test/mifishdbv3.83.fa
The head line of RefDB (FASTA) follows this rule:
gb|accessionID|species_scientific_name
Replace blanks with underscores in the species name. Here are examples.
>gb|LC021149|Ostorhinchus_angustatus
CACCGCGGTTATACGAGAGGCCCAAGCTGACAATCACCGGCGTAAAGAGTGGTTAATGAC
CCCACAATAATAAAGTCGAACATCTCCAAAGTTGTTGAACACATTCGAAGATATGAAGCT
CTACCACGAAAGTGACTTTACACTCTTTGAACCCACGAAAGCTAGGAAA
>gb|LC579122|Ostorhinchus_angustatus
CACCGCGGTTATACGAGGGGCCCAAGCTGACAATCACCGGCGTAAAGAGTGGTTAATAAC
CCCACAATAATAAAGTCGAACATCTCCAAAGTTGTTGAACACATTCGAAGATATGAAGCT
CTACCACGAAAGTGACTTTACACTCTTTGAACCCACGAAAGCTAGGAAA
>gb|LC717543|Trachidermus_fasciatus
CACCGCGGTTATACGAGAGACTCAAGCTGACAAACACCGGCGTAAAGCGTGGTTAAGCTA
AAAATTTGCTAAAGTCAAACACCTTCAAGACTGTTATACGTACCCGAAGGCAGGAAGCAC
AACCACGAAAGTGACTTTAACTAAGCTGAATCCACGAAAGCTAAGGAA
accessionID can be any unique strings. Primers were trimmed off from the sequences.
Optional (important❗️)
Following optional parameters are designed for MiFish metabarcoding primers. If running with other eDNA primers, change them to satisfy your own primers.
Length filtering
-m MIN_READ_LEN, --min-read-len MIN_READ_LEN
Minimum read length(bp) (default: 204)
-M MAX_READ_LEN, --max-read-len MAX_READ_LEN
Maximum read length(bp) (default: 254)
The range of amplicon lengths (including primers). Adjust them to satisfy your own primers. You can estimate the range of from your reference database file.
Primer sequences
-f PRIMER_FWD, --primer-fwd PRIMER_FWD
forward sequence of primer (5->3) (default: GTCGGTAAAACTCGTGCCAGC)
-r PRIMER_REV, --primer-rev PRIMER_REV
reverse sequence of primer (5->3) (default: CATAGTGGGGTATCTAATCCCAGTTTG)
change them according to your own primers
Optional
Following optional parameters are designed for all metabarcoding primers.
Group samples
-d OTHER_DATA_DIR, --other-data-dir OTHER_DATA_DIR
other directory of the amplicon sequencing data file (FASTQ/FASTA). Can specify multiple times. Each directory is considered as a group (default: None)
If your samples are in multiple groups, please arrange them in different directories and use the -d parameter for multiple times. e.g. -d 2nd_group_dir -d 3rd_group_dir
Threshold of BLASTN identity
-i BLAST_MIN_IDENTITY, --blast-min-identity BLAST_MIN_IDENTITY
Minimum identity (percentage) for filtering BLASTN results (default: 97.0)
Threshold of UNOISE3
-u UNOISE_MIN, --unoise-min UNOISE_MIN
value for the -minsize option in UNOISE3 (default: 8)
Decrease this value would get higher sensitivity but lower accuracy.
Skip downstream analysis
-s, --skip-downstream-analysis
Skip abandance statics, phylogenetic and bio-diversity analysis (default: False)
Turn on this option if you only want to get taxonomy identification results and do not need other analysis.
Output directory
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
directory for output (default: .)
Default is putting MiFishResult under your current directory. If you specify another directory /path/dir/, it will put results into /path/dir/MiFishResult
Number of threads
-t THREADS, --threads THREADS
number of threads for BLASTN and usearch (default: 2)
Pass to external programs such as usearch
Keep temporary files
-k, --keep-tmp-files Keep temporary files (default: False)
Useful for debug. If you encountered problems, turn it on and share me the Sample-* directory in the MiFishResult directory.
Results
There are six files in the MiFishResult directory.
QC.zip
read_stat.xlsx
taxonomy.xlsx
tree.zip (if not using -s)
relative_abandance.json (if not using -s)
diversity.json (if not using -s but using -d)
The first four files are the same as the web version of MiFish. (Screenshots were from DRR126155 against refDB v3.83)

An example on using other eDNA primers
See Riaz
Tips
- Please make sure that in a FASTQ/FASTA file, names of reads should start with an identitcal word, such as:
@DRR231392.1
@DRR231392.2
@DRR231392.3
Otherwise usearch cannot work properly.