openfold
openfold copied to clipboard
Inference failed in data_pipeline.py: ValueError: setting an array element with a sequence.
Traceback (most recent call last):
File "run_pretrained_openfold.py", line 257, in
Could you elaborate? For which protein does this happen? How are you running OpenFold?
Could you elaborate? For which protein does this happen? How are you running OpenFold?
Sorry , I'm testing with an test fasta here, and I running with the inference command in README.
and I add breakpoint at 211 line of data_pipeline.py, re-run the inference, found the deletion_matrix content:
(Pdb) len(deletion_matrix)
6
(Pdb) type(deletion_matrix[0])
<class 'list'>
(Pdb) len(deletion_matrix[0])
32763
(Pdb) len(deletion_matrix[1])
48502
(Pdb) len(deletion_matrix[2])
48502
(Pdb) len(deletion_matrix[3])
48502
And this error seems due to sub list has difference length:
>>> deletion_matrix=[[111, 222, 333], [1, 2, 3], [1, 2, 3]]
>>> np.array(deletion_matrix, dtype=np.int32)
array([[111, 222, 333],
[ 1, 2, 3],
[ 1, 2, 3]], dtype=int32)
>>> deletion_matrix=[[111, 222, 333], [1, 2, 3], [1, 2, 3, 4]]
>>> np.array(deletion_matrix, dtype=np.int32)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.
Could you elaborate? For which protein does this happen? How are you running OpenFold?
Sorry , I'm testing with an test fasta here, and I running with the inference command in README.
and I add breakpoint at 211 line of data_pipeline.py, re-run the inference, found the deletion_matrix content:
(Pdb) len(deletion_matrix) 6 (Pdb) type(deletion_matrix[0]) <class 'list'> (Pdb) len(deletion_matrix[0]) 32763 (Pdb) len(deletion_matrix[1]) 48502 (Pdb) len(deletion_matrix[2]) 48502 (Pdb) len(deletion_matrix[3]) 48502
And this error seems due to sub list has difference length:
>>> deletion_matrix=[[111, 222, 333], [1, 2, 3], [1, 2, 3]] >>> np.array(deletion_matrix, dtype=np.int32) array([[111, 222, 333], [ 1, 2, 3], [ 1, 2, 3]], dtype=int32) >>> deletion_matrix=[[111, 222, 333], [1, 2, 3], [1, 2, 3, 4]] >>> np.array(deletion_matrix, dtype=np.int32) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.
Hi @longerzone, I encountered this problem too, have you solved this problem?
Sorry---this one slipped through the cracks. @longerzone that FASTA contains a DNA sequence, not a protein, so I wouldn't expect it to work with OpenFold out of the box. You'll need to make extensive changes to the data processing pipeline to accommodate the different nucleotide types and then retrain the model from scratch.
@willx-y are you also using DNA?
I'm not sure if this is the same issue that @willx-y has. But, I got this same error message when running inference on a protein FASTA with the --use_precomputed_alignments
option enabled. It turns out that my MSAs (which I'd gotten from the colabfold mmseqs2 server) unexpectedly had a null byte at the end of each .a3m
file. When I removed those null bytes, it fixed the error!
I'm getting this same error for a variety of different protein sequences. I'm not using the --use_precomputed_alignments option. Any ideas on why this is happening? The sequences I'm getting these errors on have run through AlphaFold without any errors.
Is it every protein sequence, or just some?