chewBBACA icon indicating copy to clipboard operation
chewBBACA copied to clipboard

Blast Local IDs too long?

Open cizydorczyk opened this issue 3 months ago • 1 comments

When I try to run chewbbaca createschema, I get the following error:

$ chewBBACA.py CreateSchema -i enterobacter-assembly-copies/ -o ./ecloacae-schema --n ecloacae-wgmlst-schema --ptf GCF_001875655.1.trn --cpu 24

chewBBACA version: 3.3.3
Authors: Rafael Mamede, Pedro Cerqueira, Mickael Silva, João Carriço, Mário Ramirez
Github: https://github.com/B-UMMI/chewBBACA
Documentation: https://chewbbaca.readthedocs.io/en/latest/index.html
Contacts: [email protected]

============================
  chewBBACA - CreateSchema
============================
Started at: 2024-04-02T15:20:00

Prodigal training file: GCF_001875655.1.trn
Prodigal mode: single
CPU cores: 24
BLAST Score Ratio: 0.6
Translation table: 11
Minimum sequence length: 201
Size threshold: 0.2
Word size: 5
Window size: 5
Clustering similarity: 0.2
Representative filter: 0.9
Intra-cluster filter: 0.9

 CDS prediction 
================
Predicting CDSs for 481 inputs...
 [====================] 100%
Extracted a total of 2278483 CDSs from 481 inputs.

 CDS deduplication 
===================
Identifying distinct CDSs...
Identified 539683 distinct CDSs.

 CDS translation 
=================
Translating 539683 CDS...
 [====================] 100%
10041 CDSs could not be translated.

 Protein deduplication 
=======================
Identifying distinct proteins...
Identified 322764 distinct proteins.
Kept 322764 sequences after filtering the initial sequences.

 Protein clustering 
====================
Clustering proteins...
 [====================] 100%
Clustered 322764 proteins into 27110 clusters.
Removing proteins highly similar to the cluster representative...
Removed 79907 sequences.
Identified 14106 singletons.
Remaining sequences after representative and singleton pruning: 242857
Removing sequences highly similar to other clustered sequences...
Removed 140800 sequences.
Clusters to BLAST: 13004
Performing all-vs-all BLASTp per cluster...
b'BLAST Database creation error: Near line 1, the local id is too long.  Its length is 58 but the maximum allowed local id length is 50.  Please find and correct all local ids that are too long.\n'

This was using chewBBACA v3.3.3 (blast v2.15). My input sequences are simply from Unicycler and their contig headers follow the format:

>1 ... ... ...
>2 ... ... ...
>3 ... ... ...

There is a space between the contig number and the rest of the info.

Any help is appreciated. Conrad

cizydorczyk avatar Apr 03 '24 13:04 cizydorczyk