Issue of failed featurizer from version 0.4.1
Hi,
I was using Boltz-1 version 0.4.1 for some inference tasks, among which some failed with a series of errors like below and it took a long time to fail:
Featurizer failed on 1mu8_B_248 with error index 12456 is out of bounds for axis 0 with size 12456. Skipping.
I am aware that some other issues have discussed the same problem, including #4, #162, and #184. However, as they either were based on an older version of Boltz-1, or do not seem to have a continuing discussion/definitive resolution, I wanted to bring this issue to the authors' attention.
In my case, I ran the following command:
boltz predict 1mu8_B_248.fasta --output_format pdb --write_full_pae --write_full_pde --out_dir .
Here is the FASTA file the failed run used:
>A|protein|msa_1.csv
TFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGR
>B|protein|msa_0.csv
IVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
>C|smiles
Cc1ccnc(c1F)CNC(=O)CN2C(=C[NH+]=C(C2=O)NCC(c3cccc[nH+]3)(F)F)C
The two CSV files are both from the MMseqs2 server in another (successful) inference task (with --use_msa_server) for a binding complex that shared the same sequences. The prediction was performed on an A40 GPU. Here I have also attached the two CSV files.
I have also tried removing the MSA paths from the FASTA file and rerunning the same command without the --use_msa_server server and it worked. The generated CSV files in this new run are exactly the same as msa_0.csv and msa_1.csv. This is weird to me, as the two approaches should have been equivalent if they had the same MSA files. I am not entirely sure if this indicates that the issue is kind of stochastic (as one comment in #4 pointed out), but I tried rerunning the original command (with the FASTA file specifying MSA paths) 3 times and the same issue persisted.
Please let me know if there is any additional information needed for troubleshooting! Thanks so much for your attention and for making this tool open-sourced!