cblaster
cblaster copied to clipboard
Error using makedb with gff and fa files
Hi, I'm having trouble using makedb to create my own database. I have a directory with the .fa and .gff files for each genome and run the following:
cblaster makedb /home/rlhoover/cblaster/03_Chosen_GFF-Fasta/*.gff -n myGallDb -f
Importing genomicsqlite failed, falling back to SQLite3
[12:57:34] INFO - Starting makedb module
[12:57:34] INFO - Initialising cblaster SQLite3 database to myGallDb.sqlite3
[12:57:34] INFO - Parsing 104 genome files, in 1 batches of 104
[12:57:34] INFO - Processing batch 1
[12:57:34] INFO - Ca_Houarnoksidobacter_IN7.gff
[12:57:34] INFO - Ferrigenium_9BH_112.gff
[12:57:34] INFO - Ferrigenium_An22.gff
It goes through all the .gff files in my directory then ends with:
[12:57:40] ERROR - File parsing failed, exiting...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib64/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 234, in parse_file
for record in function()
File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 165, in parse_gff
regions = find_regions(gff.directives)
File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 103, in find_regions
_, accession, start, end = directive.split(" ")
ValueError: not enough values to unpack (expected 4, got 2)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/database.py", line 216, in makedb
for organism in pool.imap(func, group):
File "/usr/lib64/python3.7/multiprocessing/pool.py", line 748, in next
raise value
ValueError: not enough values to unpack (expected 4, got 2)
I've included one of my gff and fa files for reference. Sample-gff-fa.zip
Hi @Rene-Hoover, it seems cblaster was tripping up because it expects ##sequence-region
lines that resemble:
##sequence-region ctg123 1 1497228
but the ones in your file lack the coordinates. I added a check to get around this in v1.13.15 (available from pip now) which skips these lines. I can now create a database using your files with the command:
cblaster makedb -n myDb ~/Downloads/Sample-gff-fa/Sideroxydans_ES1.gff
Thanks @gamcil I appreciate the update. It fixed the issue for most of my files, but when I run the makedb command I get the following error for a small number of my files:
`[09:30:54] ERROR - File parsing failed, exiting...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib64/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 238, in parse_file
for record in function()
File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 167, in parse_gff
sort_attribute_values=True
File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 1405, in create_db
c.create()
File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 543, in create
self._populate_from_lines(self.iterator)
File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 622, in _populate_from_lines
self._insert(f, c)
File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 566, in _insert
cursor.execute(constants._INSERT, feature.astuple())
sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/database.py", line 216, in makedb
for organism in pool.imap(func, group):
File "/usr/lib64/python3.7/multiprocessing/pool.py", line 748, in next
raise value
sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type.
`
I suspect it may be another issue with the gff format, but I'm not sure. Attached are 3 of the files that trigger the error. Sample-gff-fa-2.zip
Thanks @gamcil I appreciate the update. It fixed the issue for most of my files, but when I run the makedb command I get the following error for a small number of my files:
`[09:30:54] ERROR - File parsing failed, exiting... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib64/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 238, in parse_file for record in function() File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/genome_parsers.py", line 167, in parse_gff sort_attribute_values=True File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 1405, in create_db c.create() File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 543, in create self._populate_from_lines(self.iterator) File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 622, in _populate_from_lines self._insert(f, c) File "/home/rlhoover/.local/lib/python3.7/site-packages/gffutils/create.py", line 566, in _insert cursor.execute(constants._INSERT, feature.astuple()) sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type. """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/rlhoover/.local/lib/python3.7/site-packages/cblaster/database.py", line 216, in makedb for organism in pool.imap(func, group): File "/usr/lib64/python3.7/multiprocessing/pool.py", line 748, in next raise value sqlite3.InterfaceError: Error binding parameter 11 - probably unsupported type. `
I suspect it may be another issue with the gff format, but I'm not sure. Attached are 3 of the files that trigger the error. Sample-gff-fa-2.zip
Update: I tested a few of the files that appeared problematic individually and cblaster created a local database for them. So, I'm wondering if it's the number of genomes I'm using for my local database. I have 104 genomes total. If I use makedb either by folder (genomes/*.gff) or with a list of all 104 file names I get the error I posted yesterday. However, if I use makedb on a subset of the genomes (<20) it appears to work fine regardless of which .gff files I tell it to use.
Hi, @gamcil and @Rene-Hoover I truly need your assistance with this cblaster makedb command; kindly assist me. Unfortunately, despite my best efforts, I have been unable to figure out how to use. Actually whenever i am giving gbk input cblaster didn't create .dmnd file it only create fasta and sqlite3, however when i am giving input gff file it create all 3 files but it gives some error like-
cblaster search -m local -db ps_db.dmnd -qf ~/Neelam/output12type.fasta [11:44:18] INFO - Starting cblaster in local mode [11:44:18] ERROR - Error: Incomplete database file. Database building did not complete successfully.
Thank you
Hi @neelam19051 I had trouble using the makedb command too, but it seemed to be because of the way my gff files were formatted. I don't think I tried making a database with gbk files. Hopefully, someone else will chime in and be able to assist you. It seems like the search isn't working because your database file didn't build properly, but I'm really not sure what the solution would be.