MSA always rejected as “does not match input sequence” on offline install, even when FASTA and A3M are identical
Hi Boltz-2 team,
I’ve been running Boltz-2 on an offline server (Docker, no internet) and hit a reproducible problem with MSAs.
Problem: I ran Boltz-2 on an offline server, and manually added MSA files after running colabfold_search on an online machine, but Boltz-2 always rejected the MSA files.
Even when I supply an A3M file whose top sequence is identical to the input FASTA sequence (same header, same letters, no lowercase/gaps), Boltz-2 logs: "Warning: MSA does not match input sequence, creating dummy..." and discards the MSA.
Steps to reproduce:
- Create a minimal FASTA:
-
A
- AHKLFIGGLPNYLNDDQVKELLTSFGPLKAFNLVKDSATGLSKGYAFCEYVDINVTDQAIAGLNGMQLGDKKLLVQRASVGAKNA
- Create an identical A3M:
-
A
- AHKLFIGGLPNYLNDDQVKELLTSFGPLKAFNLVKDSATGLSKGYAFCEYVDINVTDQAIAGLNGMQLGDKKLLVQRASVGAKNA
- YAML (test.yaml):
- version: 1
- sequences:
-
- protein:
-
id: A -
sequence: -
AHKLFIGGLPNYLNDDQVKELLTSFGPLKAFNLVKDSATGLSKGYAFCEYVDINVTDQAIAGLNGMQLGDKKLLVQRASVGAKNA -
msa: ./protein.a3m -
- ligand:
-
id: B -
smiles: "CCO" - properties:
-
- affinity:
-
binder: B - Run:
- boltz predict --cache /root/.boltz test.yaml
Expected: Boltz accepts the A3M (since FASTA and A3M are byte-identical). Actual: Always prints the warning and discards the MSA.
Environment: • Running inside Docker on an offline machine (GPU available, H100). • Python: Python 3.10.12
Notes: • The same FASTA+A3M pair works fine on my online machine. • Tried different headers (>A, >query, FASTA header copy), LF-only line endings, no BOM. Same result. • This makes me think the MSA checker in the offline build might be overly strict or miscompiled.
Has anyone else seen this issue on offline installs? Any recommended patch/workaround? Thanks for maintaining Boltz-2 — happy to test a fix if you point me to the relevant code path.
I have the same issue with my offline install. I would be very grateful for any advice on how to fix this.
I also have the same issue with the offline install
In my case, the problem occurred because I first ran the example input file and later changed its contents without renaming the file. As a result, boltz tried to use the old MSA that had been generated previously. However, since I had modified the protein sequence, the old MSA no longer matched the new sequence. Renaming the input file or deleting the previously computed MSAs resolved the issue for me.
I found out what the issue was in my case. The a3m file I generated with jackhammer had sequences split in multiple rows:
>protein
ASAPLHLGKCNIAGWILGNPECESLSTASSWSYIVETPSSDNGTCYPGDFIDYEELREQLSSVSSFERFEIFPKTSSWPNHDSDKGVTAACPHAGAKSFY
KNLIWLVKKGNSYPKLSKSYINDKGKEVLVLWGIHHPSTSADQQSLYQNADAYVFVGSSRYSKTFKPEIAIRPKVRDREGRMNYYWTLVEPGDKITFEAT
GNLVVPRYAFAMERNAGSGLEVLFQ
By reformatting the a3m file to have single line sequences solved the issue:
>protein
ASAPLHLGKCNIAGWILGNPECESLSTASSWSYIVETPSSDNGTCYPGDFIDYEELREQLSSVSSFERFEIFPKTSSWPNHDSDKGVTAACPHAGAKSFYKNLIWLVKKGNSYPKLSKSYINDKGKEVLVLWGIHHPSTSADQQSLYQNADAYVFVGSSRYSKTFKPEIAIRPKVRDREGRMNYYWTLVEPGDKITFEATGNLVVPRYAFAMERNAGSGLEVLFQ