GTDBTk icon indicating copy to clipboard operation
GTDBTk copied to clipboard

error Mask and alignment length do not match.

Open mattoslmp opened this issue 2 years ago • 1 comments

Dear, Can you help me? Please see below:

The first error was that several Tigrfam and pfam files (markers) were missing from the gtdbk r207 database. So I include them, I did the downloading from previous versions in the public gtdbk database.

But now, I am having the error below: [2022-08-01 23:30:56] INFO: GTDB-Tk v1.0.2 [2022-08-01 23:30:56] INFO: gtdbtk classify_wf --genome_dir /bio_temp/c0232/all_fastq/BIN_REFINEMENT_50_5/metawrap_50_5_bins/ --out_dir ./out-gtdbk --cpus 65 --extension .fa --scratch_dir ./intermediary [2022-08-01 23:30:56] INFO: Using GTDB-Tk reference data version r207: /bio_temp/share_bio/softwares/miniconda3/envs/gtdbtk/gtdbtk-db/release207/ [2022-08-01 23:30:56] INFO: Identifying markers in 25 genomes with 65 threads. [2022-08-01 23:30:56] INFO: Running Prodigal V2.6.3 to identify genes. ==> Finished processing 17 of 25 (68.0%) genomes. ==> Finished processing 25 of 25 (100.0%) genomes. [2022-08-01 23:31:35] INFO: Identifying TIGRFAM protein families. ==> Finished processing 17 of 25 (68.0%) genomes. ==> Finished processing 25 of 25 (100.0%) genomes. [2022-08-01 23:31:42] INFO: Identifying Pfam protein families. ==> Finished processing 17 of 25 (68.0%) genomes. ==> Finished processing 25 of 25 (100.0%) genomes. [2022-08-01 23:31:43] INFO: Annotations done using HMMER 3.3.2 (Nov 2020) [2022-08-01 23:31:43] INFO: Done. [2022-08-01 23:31:43] INFO: Aligning markers in 25 genomes with 65 threads. [2022-08-01 23:31:44] INFO: Processing 14 genomes identified as bacterial. [2022-08-01 23:31:51] INFO: Read concatenated alignment for 62291 GTDB genomes. ==> Finished aligning 14 of 14 (100.0%) genomes. [2022-08-01 23:32:17] INFO: Masking columns of multiple sequence alignment using canonical mask. [2022-08-01 23:34:16] ERROR: Controlled exit resulting from an unrecoverable error or warning.

================================================================================ EXCEPTION: MSAMaskLengthMismatch MESSAGE: Mask and alignment length do not match.


Traceback (most recent call last): File "/bio_temp/share_bio/softwares/miniconda3/envs/gtdbtk/bin/gtdbtk", line 449, in gt_parser.parse_options(args) File "/bio_temp/share_bio/softwares/miniconda3/envs/gtdbtk/lib/python3.8/site-packages/gtdbtk/main.py", line 623, in parse_options self.align(options) File "/bio_temp/share_bio/softwares/miniconda3/envs/gtdbtk/lib/python3.8/site-packages/gtdbtk/main.py", line 263, in align markers.align(options.identify_dir, File "/bio_temp/share_bio/softwares/miniconda3/envs/gtdbtk/lib/python3.8/site-packages/gtdbtk/markers.py", line 591, in align trimmed_seqs, pruned_seqs = self._apply_mask(gtdb_msa, File "/bio_temp/share_bio/softwares/miniconda3/envs/gtdbtk/lib/python3.8/site-packages/gtdbtk/markers.py", line 353, in _apply_mask raise MSAMaskLengthMismatch( gtdbtk.exceptions.MSAMaskLengthMismatch: Mask and alignment length do not match.

mattoslmp avatar Aug 03 '22 13:08 mattoslmp

Hello, GTDB-Tk 1.0.2 is an old version of TK and is incompatible with the latest release of GTDB (R207). I would recommend updating Tk to the latest version 2.1.1 (Unless you have a specific reason to be using GTDB-Tk v1.0.2).

Cheers, Pierre

pchaumeil avatar Aug 08 '22 01:08 pchaumeil