design-bench icon indicating copy to clipboard operation
design-bench copied to clipboard

Completeness of TF Bind 8 Data in Design-bench

Open sqhang opened this issue 9 months ago • 2 comments

According to the Design-bench paper, the TF Bind 8 dataset by default includes binding affinity data for TF SIX6_REF_R1. I understand that the Design-bench dataset is currently undergoing migration. In the interim, I accessed a folder named tf_bind_8-SIX6_REF_R1 within the design_bench_data_incomplete.zip file from the Google Drive link shared by the repository maintainer. Could you confirm whether this folder contains the complete TF Bind 8 dataset as referenced in the Design-bench paper?

Thank you for your assistance!

sqhang avatar May 12 '24 20:05 sqhang

Hi,

TFBind8's subvariant by default is tf_bind_8-SIX6_REF_R1, and this will be downloaded automatically in the new migration branch (chris/fixes-v2). Was there another version of TFBind8 you wanted to access?

Data I uploaded to my Google Drive in the past were various segments of the entire design_bench_data folder. But shortly after @brandontrabucco also uploaded data which appears to contain other variants of TFBind8. For his data I see about 200 .npy files seemingly corresponding to different versions, e.g. the first 10 are:

tf_bind_8-ARX_L343Q_R1-y-0.npy
tf_bind_8-ARX_L343Q_R2-y-0.npy
tf_bind_8-ARX_P353L_R1-y-0.npy
tf_bind_8-ARX_P353L_R2-y-0.npy
tf_bind_8-ARX_P353R_R1-y-0.npy
tf_bind_8-ARX_P353R_R2-y-0.npy
tf_bind_8-ARX_R332H_R1-y-0.npy
tf_bind_8-ARX_R332H_R2-y-0.npy
tf_bind_8-ARX_REF_R1-y-0.npy
tf_bind_8-ARX_REF_R2-y-0.npy

Does this answer your question?

  • Edit: This is where the data is being pulled from in the new migration branch: https://huggingface.co/datasets/beckhamc/design_bench_data/tree/main

Thanks, Chris

christopher-beckham avatar May 12 '24 22:05 christopher-beckham

Hi Chris,

Thanks a lot for your detailed answer!

I believe you answered my question very well! I have confirmed the data here for TF SIX6_REF_R1 is mainly what I need. As you suggested, I have also found around 200 .npy files for binding affinities of other TFs in Brandon's upload. One quick follow-up question,

  1. Do those 200 tf_bind_8_{TF}_y-0.npy files all correspond to the same default x file? Namely, do they correspond to the same list of 8-mers as in TF SIX6_REF_R1?

Thanks a lot for your help!

sqhang avatar May 27 '24 08:05 sqhang