DRAM
DRAM copied to clipboard
Error when setting up databases
Hi DRAM developers,
This is amazing tools, it will help me annotate MAGs function. But I encountered some problems when I set up database, could you help me to solve it? Thanks!
2023-09-18 03:42:49,148 - Downloading module_step_form 2023-09-18 03:42:49,713 - Downloading function_heatmap_form 2023-09-18 03:42:50,180 - Downloading amg_database 2023-09-18 03:42:50,454 - Downloading etc_module_database 2023-09-18 03:42:50,683 - All raw data files were downloaded successfully 2023-09-18 03:42:50,684 - Processing kofam_hmm 2023-09-18 03:54:33,232 - KOfam database processed 2023-09-18 03:54:33,743 - Moved kofam_hmm to final destination, configuration updated 2023-09-18 03:54:33,743 - Processing kofam_ko_list 2023-09-18 03:54:33,837 - KOfam ko list processed 2023-09-18 03:54:33,843 - Moved kofam_ko_list to final destination, configuration updated 2023-09-18 03:54:33,843 - Processing pfam 2023-09-18 05:17:41,090 - PFAM database processed 2023-09-18 05:17:41,256 - Moved pfam to final destination, configuration updated 2023-09-18 05:17:41,262 - Moved pfam_hmm to final destination, configuration updated 2023-09-18 05:17:41,262 - Processing dbcan 2023-09-18 05:17:44,779 - dbCAN database processed 2023-09-18 05:17:44,787 - Moved dbcan to final destination, configuration updated 2023-09-18 05:17:44,792 - Moved dbcan_fam_activities to final destination, configuration updated 2023-09-18 05:17:44,797 - Moved dbcan_subfam_ec to final destination, configuration updated 2023-09-18 05:17:44,798 - Processing vogdb 2023-09-18 05:23:42,771 - VOGdb database processed 2023-09-18 05:23:42,868 - Moved vogdb to final destination, configuration updated 2023-09-18 05:23:42,877 - Moved vog_annotations to final destination, configuration updated 2023-09-18 05:23:42,878 - Processing viral 2023-09-18 05:23:44,537 - The subcommand ['mmseqs', 'createdb', 'DRAM_data1/database_files/viral.merged.protein.faa.gz', 'DRAM_data1/refseq_viral.20230918.mmsdb'] experienced an error: Fasta entry 117637 is invalid
Traceback (most recent call last):
File "/usr2/people/ruiwenhu/miniconda3/envs/DRAM/bin/DRAM-setup.py", line 184, in
Best, Ruiwen
Apologies for the late reply. Can you provide sequence #117637 in the "viral.merged.protein.faa.gz" file? There have been issues with mmseqs in the past where the fasta headers are incorrectly formatted: https://github.com/soedinglab/MMseqs2/issues/446
Hi BioRRW,
I have check the file "viral.merged.protein.faa.gz"
what need do to solve this problem. thanks
hi I try to use command "zcat viral.merged.protein.faa.gz | grep -A 1 '^>117637$'" to check the file, it showed that gzip: viral.merged.protein.faa.gz: invalid compressed data--format violated.
Thank you for providing this information.
This output, invalid compressed data--format violated
hints at a problem with your viral.merged.protein.faa.gz
.
I suggest, as the file name hints at, re-merging the files. Make sure you do not perform a cat viral1.faa.gz viral2.faa.gz > viral.merged.faa.gz
as cat
needs decompressed files.
I would use the command you use zcat
, like you did to print out the contents of the file, or decompress the files and merge them before gzipping them again.
It may be advised to see if the files you merged to create viral.merged.protein.faa.gz
are valid gzipped files as well. You could do this by trying to view them with zcat
or using the 'test' option in gzip: gzip -t [filename.gz]
.
Hope this helps and keep us posted of your progress.
Hi BioRRW, I check the my data in database, I found that there was no files named "viral.2.protein faa.gz", only have one file "viral.1.protein.faa.gz". So how do I re-merg one file? or I need to redownload the "viral.1.protein.faa.gz" and "viral.2.protein faa.gz"? what is code to redownload these files? Thank you very much!