Revisiting-PLMs icon indicating copy to clipboard operation
Revisiting-PLMs copied to clipboard

Metal ion binding dataset

Open empyriumz opened this issue 2 years ago • 19 comments

Hi there,

Nice work! I have a question about the metal ion binding dataset used in your paper. Could you let me know where do you get the original dataset?

Thanks!

empyriumz avatar Nov 17 '22 20:11 empyriumz

Hi, empyriumz:

Metal ion binding dataset collected from PDB(https://www.rcsb.org/). If the protein has any Metal ion binding site, we set its label as 1.

elttaes avatar Nov 18 '22 02:11 elttaes

Thanks for your reply! To clarify, I tries to search on PDB for metal ion binding: image or image

both result in 87,669 entries. Do you also perform similar queries and compile the dataset?

empyriumz avatar Nov 18 '22 16:11 empyriumz

We wrote a crawler to crawl the annotations of each PDB protein. Do you need the original dataset we collected?

elttaes avatar Nov 18 '22 16:11 elttaes

By original dataset, do you mean all the PDB files? That would be too large I guess, so could you share the script used for search and annotate the PDB entries? Thanks!

empyriumz avatar Nov 18 '22 16:11 empyriumz

I am so sorry that the classmates who wrote the crawler are not on the author list and are unwilling to give it to us. They now have a job and will also release the relevant dataset. I can notify you after their paper is released.

But I can give you a simple code that can check whether each page contains keywords. It may help you.

url = 'https://www.rcsb.org/annotations/2XEV'
req = urllib.request.Request(url=url)
content = urllib.request.urlopen(req).read() 
content = content.decode('utf-8') 
soup = BeautifulSoup(content,"html.parser")
tag = soup.find_all(text='metal ion binding')

If the page does not contain the 'metal ion binding' then the code will return a null list.

elttaes avatar Nov 24 '22 03:11 elttaes

Hi, I try to use your metal alphafold code to predict other protein features, but I find that your code use a pkl data as the input, so I want to know how you generate the pkl files.Thanks!

Violet969 avatar Nov 25 '22 05:11 Violet969

Hi, I try to use your metal alphafold code to predict other protein features, but I find that your code use a pkl data as the input, so I want to know how you generate the pkl files.Thanks!

Hi Violet969:

This pkl including MSA and template information. Related code you can see https://github.com/deepmind/alphafold/blob/main/run_alphafold.py line 172-174. When data_pipeline.process input a fasta and it will return MSA, template and pkl.

feature_dict = data_pipeline.process(
    input_fasta_path=fasta_path,
    msa_output_dir=msa_output_dir)

Pkl detail information you can see Alphafold paper's supplementary information pages 8-9.

I have already released the MSA on https://drive.google.com/drive/folders/1iShEW8NcMIlWqxTRgsEaI_t5ahoHsixt?usp=share_link

But the code to generate pkl maybe you need to modify some on run_alphafold.py. I can upload this part of the preprocessing code later.

elttaes avatar Nov 25 '22 08:11 elttaes

I see, thanks for your sample code! I'll try to see if the results match with my aforementioned one.

empyriumz avatar Nov 28 '22 15:11 empyriumz

Thanks for your answer. I also have a question, I saw that you use Evofomer and ESM to predict protein SS. But I don't see these code, will you share that?

Violet969 avatar Nov 30 '22 16:11 Violet969

Thanks for your answer. I also have a question, I saw that you use Evofomer and ESM to predict protein SS. But I don't see these code, will you share that?

Sure, I will upload this part of the code later.

elttaes avatar Dec 07 '22 12:12 elttaes

Thanks for your answer. I also have a question, I saw that you use Evofomer and ESM to predict protein SS. But I don't see these code, will you share that?

Hi Violet969, Secondary structure related codes and the code that can generate pkl from a3m have been uploaded into the Structure folder and Data folder, if you have any questions you can contact me.

elttaes avatar Dec 19 '22 09:12 elttaes

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

Violet969 avatar Dec 21 '22 13:12 Violet969

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

Hi, I have added an example, you can have a look at the latest code.

elttaes avatar Dec 21 '22 14:12 elttaes

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

Hi, I have added an example, you can have a look at the latest code.

Thanks for your answer. I also want an example for run metal/alphafold/train.py. Can you share that?

Violet969 avatar Dec 21 '22 16:12 Violet969

I see, thanks for the answer. I used merge_msa.py but it didn't work, can you show me a case how to use it?

Hi, I have added an example, you can have a look at the latest code.

Thanks for your answer. I also want an example for run metal/alphafold/train.py. Can you share that?

Now you should be able to run train.py directly with a few simple modifications. Please make sure you have configured the Alphafold runtime environment.

In addition, it seems that the current Alphafold parameter format is different from before. You can try to find the previous public parameter file.

elttaes avatar Dec 23 '22 16:12 elttaes

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant Traceback (most recent call last): File "train.py", line 264, in app.run(main) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train.py", line 216, in main state = jax.pmap(updater.init)(rng_pmap, data) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss out_tree, out_flat = f_pmapped(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f out = pxla.xla_pmap( File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind return map_bind(self, fun, *args, **params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind outs = primitive.process(top_trace, fun, tracers, params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process return trace.process_map(self, fun, tracers, params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call return primitive.impl(f, *tracers, **params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl return compiled_fun(*args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper return func(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs) jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "train.py", line 264, in app.run(main) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train.py", line 216, in main state = jax.pmap(updater.init)(rng_pmap, data) jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

Violet969 avatar Jan 01 '23 08:01 Violet969

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant Traceback (most recent call last): File "train.py", line 264, in app.run(main) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train.py", line 216, in main state = jax.pmap(updater.init)(rng_pmap, data) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss out_tree, out_flat = f_pmapped(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f out = pxla.xla_pmap( File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind return map_bind(self, fun, *args, **params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind outs = primitive.process(top_trace, fun, tracers, params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process return trace.process_map(self, fun, tracers, params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call return primitive.impl(f, *tracers, **params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl return compiled_fun(*args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper return func(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs) jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well). The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "train.py", line 264, in app.run(main) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train.py", line 216, in main state = jax.pmap(updater.init)(rng_pmap, data) jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

I tested this code on A40(48GB) server and it works. You can try to set " os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2' " or lower to reduce memory usage.

elttaes avatar Jan 01 '23 10:01 elttaes

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant Traceback (most recent call last): File "train.py", line 264, in app.run(main) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train.py", line 216, in main state = jax.pmap(updater.init)(rng_pmap, data) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss out_tree, out_flat = f_pmapped(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f out = pxla.xla_pmap( File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind return map_bind(self, fun, *args, **params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind outs = primitive.process(top_trace, fun, tracers, params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process return trace.process_map(self, fun, tracers, params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call return primitive.impl(f, *tracers, **params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl return compiled_fun(*args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper return func(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs) jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well). The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "train.py", line 264, in app.run(main) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train.py", line 216, in main state = jax.pmap(updater.init)(rng_pmap, data) jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

I tested this code on A40(48GB) server and it works. You can try to set " os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2' " or lower to reduce memory usage.

Thanks for your so fast reply, that 'os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2'' works. But i met another error like these

Traceback (most recent call last):
  File "train.py", line 269, in <module>
    app.run(main)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "train.py", line 233, in main
    state, metrics = updater.update(state, data)
  File "train.py", line 176, in update
    if step % self._checkpoint_every_n == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Can you tell me how to solve it?

Violet969 avatar Jan 01 '23 11:01 Violet969

Thanks for your reply. I try to run 'train.py' on my server. But there always have an error like this.

2023-01-01 07:07:23.507834: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: INTERNAL: Failed to allocate 50331648 bytes for new constant Traceback (most recent call last): File "train.py", line 264, in app.run(main) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train.py", line 216, in main state = jax.pmap(updater.init)(rng_pmap, data) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/src/api.py", line 2158, in cache_miss out_tree, out_flat = f_pmapped(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/api.py", line 2034, in pmap_f out = pxla.xla_pmap( File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2022, in bind return map_bind(self, fun, *args, **params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2054, in map_bind outs = primitive.process(top_trace, fun, tracers, params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 2025, in process return trace.process_map(self, fun, tracers, params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/core.py", line 687, in process_call return primitive.impl(f, *tracers, **params) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 841, in xla_pmap_impl return compiled_fun(*args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/_src/profiler.py", line 294, in wrapper return func(*args, **kwargs) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/jax/interpreters/pxla.py", line 1656, in call out_bufs = self.xla_executable.execute_sharded_on_local_devices(input_bufs) jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well). The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "train.py", line 264, in app.run(main) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "train.py", line 216, in main state = jax.pmap(updater.init)(rng_pmap, data) jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Failed to allocate 50331648 bytes for new constant: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well).

I have 8 nodes of 12G GPU, and 125G mem. Can you tell me how to solve it?

I tested this code on A40(48GB) server and it works. You can try to set " os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2' " or lower to reduce memory usage.

Thanks for your so fast reply, that 'os.environ['XLA_PYTHON_CLIENT_MEM_FRACTION'] = '2'' works. But i met another error like these

Traceback (most recent call last):
  File "train.py", line 269, in <module>
    app.run(main)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/anaconda3/envs/alphafold_2/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "train.py", line 233, in main
    state, metrics = updater.update(state, data)
  File "train.py", line 176, in update
    if step % self._checkpoint_every_n == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Can you tell me how to solve it?

Delete the './tmp' folder.

elttaes avatar Jan 01 '23 11:01 elttaes