boltz icon indicating copy to clipboard operation
boltz copied to clipboard

MSA always rejected as “does not match input sequence” on offline install, even when FASTA and A3M are identical

Open tiffany-nguyen opened this issue 5 months ago • 4 comments

Hi Boltz-2 team,

I’ve been running Boltz-2 on an offline server (Docker, no internet) and hit a reproducible problem with MSAs.

Problem: I ran Boltz-2 on an offline server, and manually added MSA files after running colabfold_search on an online machine, but Boltz-2 always rejected the MSA files.

Even when I supply an A3M file whose top sequence is identical to the input FASTA sequence (same header, same letters, no lowercase/gaps), Boltz-2 logs: "Warning: MSA does not match input sequence, creating dummy..." and discards the MSA.

Steps to reproduce:

  1. Create a minimal FASTA:
  2. A

  3. AHKLFIGGLPNYLNDDQVKELLTSFGPLKAFNLVKDSATGLSKGYAFCEYVDINVTDQAIAGLNGMQLGDKKLLVQRASVGAKNA
  4. Create an identical A3M:
  5. A

  6. AHKLFIGGLPNYLNDDQVKELLTSFGPLKAFNLVKDSATGLSKGYAFCEYVDINVTDQAIAGLNGMQLGDKKLLVQRASVGAKNA
  7. YAML (test.yaml):
  8. version: 1
  9. sequences:
    • protein:
  10.   id: A
    
  11.   sequence: 
    
  12.     AHKLFIGGLPNYLNDDQVKELLTSFGPLKAFNLVKDSATGLSKGYAFCEYVDINVTDQAIAGLNGMQLGDKKLLVQRASVGAKNA
    
  13.   msa: ./protein.a3m
    
    • ligand:
  14.   id: B
    
  15.   smiles: "CCO"
    
  16. properties:
    • affinity:
  17.   binder: B
    
  18. Run:
  19. boltz predict --cache /root/.boltz test.yaml

Expected: Boltz accepts the A3M (since FASTA and A3M are byte-identical). Actual: Always prints the warning and discards the MSA.

Environment: • Running inside Docker on an offline machine (GPU available, H100). • Python: Python 3.10.12

Notes: • The same FASTA+A3M pair works fine on my online machine. • Tried different headers (>A, >query, FASTA header copy), LF-only line endings, no BOM. Same result. • This makes me think the MSA checker in the offline build might be overly strict or miscompiled.

Has anyone else seen this issue on offline installs? Any recommended patch/workaround? Thanks for maintaining Boltz-2 — happy to test a fix if you point me to the relevant code path.

tiffany-nguyen avatar Oct 01 '25 04:10 tiffany-nguyen

I have the same issue with my offline install. I would be very grateful for any advice on how to fix this.

AlexanderKroll avatar Oct 10 '25 09:10 AlexanderKroll

I also have the same issue with the offline install

cespos avatar Oct 31 '25 15:10 cespos

In my case, the problem occurred because I first ran the example input file and later changed its contents without renaming the file. As a result, boltz tried to use the old MSA that had been generated previously. However, since I had modified the protein sequence, the old MSA no longer matched the new sequence. Renaming the input file or deleting the previously computed MSAs resolved the issue for me.

AlexanderKroll avatar Oct 31 '25 21:10 AlexanderKroll

I found out what the issue was in my case. The a3m file I generated with jackhammer had sequences split in multiple rows:

>protein
ASAPLHLGKCNIAGWILGNPECESLSTASSWSYIVETPSSDNGTCYPGDFIDYEELREQLSSVSSFERFEIFPKTSSWPNHDSDKGVTAACPHAGAKSFY
KNLIWLVKKGNSYPKLSKSYINDKGKEVLVLWGIHHPSTSADQQSLYQNADAYVFVGSSRYSKTFKPEIAIRPKVRDREGRMNYYWTLVEPGDKITFEAT
GNLVVPRYAFAMERNAGSGLEVLFQ

By reformatting the a3m file to have single line sequences solved the issue:

>protein
ASAPLHLGKCNIAGWILGNPECESLSTASSWSYIVETPSSDNGTCYPGDFIDYEELREQLSSVSSFERFEIFPKTSSWPNHDSDKGVTAACPHAGAKSFYKNLIWLVKKGNSYPKLSKSYINDKGKEVLVLWGIHHPSTSADQQSLYQNADAYVFVGSSRYSKTFKPEIAIRPKVRDREGRMNYYWTLVEPGDKITFEATGNLVVPRYAFAMERNAGSGLEVLFQ

cespos avatar Nov 07 '25 11:11 cespos