dataset-joined pdb_residues file doesn`t match with fasta sequence

Open ProkopDivin opened this issue 3 years ago • 0 comments

I run these commands, where joined.ds is from: https://github.com/rdk/p2rank-datasets

./prank.sh analyze residues joined.ds ./prank analyze fasta-masked joined.ds

But several files with residues don`t match with the fasta sequence. All the files are here: files.zip

In these files length of the sequence of chain, I and L are OK, but the sequence of the chain H should be longer according to csv file.

1hxf.pdb_residues.csv

1hxf_H.fasta 1hxf_I.fasta 1hxf_L.fasta

In these files, the length of chain A is 66 and the length of B is 65 but there are 232 rows in 1pts.pbd_residues.csv and I'm not getting any other files.

1pts.pbd_residues

1pts_A.fasta 1pts_B.fasta

I always get one fasta file for each csv file with residues and the sequence is shorter than the number of rows in csv.

1bbs.pdb_residues.csv 1bb_A.fasta

1chg.pdb_residues.csv 1chg_A.fasta

1djb.pdb_residues.csv 1djb_A.fasta

2cba.pdb_residues.csv 2cba_A.fasta

2fbp.pdb_residues.csv 2fbp_A.fasta

2tga.pdb_residues.csv 2tga_A.fasta

3lck.pdb_residues.csv 3lck_A.fasta

3p2p.pdb_residues.csv 3p2p_A.fasta

3ptn.pdb_residues.csv 3ptn_A.fasta

4ca2.pdb_residues.csv 4ca2_A.fasta

5dfr.pdb_residues.csv 5dfr_A.fasta

Jul 03 '22 14:07 ProkopDivin