dgl-lifesci icon indicating copy to clipboard operation
dgl-lifesci copied to clipboard

[Roadmap] Release Plan for 0.3

Open mufeili opened this issue 4 years ago • 11 comments

This post is used to list the development plan for the next release. Feel free to leave comments if you have any requirement.

  1. Support average precision metric
  2. Pre-trained models on benchmarks like MoleculeNet, Alchemy, QM9, etc
  3. Better support for attention visualization
  4. Visualization for learned molecular representations
  5. Adjust learning rate and add gradient clipping for ogbl-ppa.
  6. Add better support for feature selection

mufeili avatar Jun 12 '20 07:06 mufeili

if xxx.txt.proc file is not correspond to the xxx.txt file, the xxx.txt.proc shou be generated again.

autodataming avatar Jun 24 '20 09:06 autodataming

file 2.rxns

[O:1]=[C:2]([OH:3])[c:4]1[c:5]([Br:6])[cH:7][cH:8][cH:9][c:10]1[NH:11][C:12](=[O:13])[CH3:14]>>[O:1]=[C:2]([OH:3])[c:4]1[c:5]([Br:6])[cH:7][cH:8][cH:9][c:10]1[NH2:11]

run the command,

python find_reaction_center_eval.py --test-path  2.rxns -np 1

it report error:


dgl._ffi.base.DGLError: Expect number of features to match number of nodes (len(u)). Got 27 and 14 instead.

autodataming avatar Jun 24 '20 09:06 autodataming

if xxx.txt.proc file is not correspond to the xxx.txt file, the xxx.txt.proc shou be generated again.

If we want to ensure that, we always need to compute graph edits from scratch. As a result, let's always generate that x.proc file from scratch. I've done that in PR #32 .

mufeili avatar Jun 25 '20 05:06 mufeili

file 2.rxns

[O:1]=[C:2]([OH:3])[c:4]1[c:5]([Br:6])[cH:7][cH:8][cH:9][c:10]1[NH:11][C:12](=[O:13])[CH3:14]>>[O:1]=[C:2]([OH:3])[c:4]1[c:5]([Br:6])[cH:7][cH:8][cH:9][c:10]1[NH2:11]

run the command,

python find_reaction_center_eval.py --test-path  2.rxns -np 1

it report error:


dgl._ffi.base.DGLError: Expect number of features to match number of nodes (len(u)). Got 27 and 14 instead.

I guess you previously held some different reactions in 2.rxns and the script loads constructed DGLGraphs for those different reactions. I'm now changing the default behavior to constructing DGLGraphs from scratch in PR #32.

mufeili avatar Jun 25 '20 05:06 mufeili

  1. DGLGraphs file "test.bin"
  2. rxn file "xxx.txt"
  3. rxn process file "xxx.txt.proc"

it will be better if the base name of DGLGraph file is consistent with the rxn file.

test.bin -> xxx.txt.bin

autodataming avatar Jun 28 '20 01:06 autodataming

  1. DGLGraphs file "test.bin"
  2. rxn file "xxx.txt"
  3. rxn process file "xxx.txt.proc"

it will be better if the base name of DGLGraph file is consistent with the rxn file.

test.bin -> xxx.txt.bin

This shall be addressed in PR #35.

mufeili avatar Jun 28 '20 06:06 mufeili

add debug mode!

In the debug mode, it will report what rxn raise the error.

run the command

python find_reaction_center_eval.py --test-path sin_map_clean.rxns   -np 1

Evaluation on the test set.
Traceback (most recent call last):
  File "find_reaction_center_eval.py", line 79, in <module>
    main(args)
  File "find_reaction_center_eval.py", line 47, in main
    args, args['top_ks_test'], model, test_loader, args['easy'])
  File "/home/NFS/user/zgong/czq/workflow_retro_deepsyn2/step3dgllifesci/dgl-lifesci/examples/reaction_prediction/rexgen_direct/utils.py", line 456, in reaction_center_final_eval
    for batch_id, batch_data in enumerate(data_loader):
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/dgllife/data/uspto.py", line 509, in __getitem__
    self.atom_pair_labels[item] = get_pair_label(mol, self.graph_edits[item])
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/dgllife/data/uspto.py", line 181, in get_pair_label
    labels[i, j, pair_to_changes[(j, i)]] = 1.
IndexError: index 62 is out of bounds for dimension 1 with size 62

obtain the head 100 rxns in the file sin_map_clean.rxns, it will not report error!

head -n 100 sin_map_clean.rxns > sin100.rxns
python find_reaction_center_eval.py --test-path sin100.rxns    -np 1

autodataming avatar Jun 28 '20 08:06 autodataming

add debug mode!

In the debug mode, it will report what rxn raise the error.

run the command

python find_reaction_center_eval.py --test-path sin_map_clean.rxns   -np 1
Evaluation on the test set.
Traceback (most recent call last):
  File "find_reaction_center_eval.py", line 79, in <module>
    main(args)
  File "find_reaction_center_eval.py", line 47, in main
    args, args['top_ks_test'], model, test_loader, args['easy'])
  File "/home/NFS/user/zgong/czq/workflow_retro_deepsyn2/step3dgllifesci/dgl-lifesci/examples/reaction_prediction/rexgen_direct/utils.py", line 456, in reaction_center_final_eval
    for batch_id, batch_data in enumerate(data_loader):
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/dgllife/data/uspto.py", line 509, in __getitem__
    self.atom_pair_labels[item] = get_pair_label(mol, self.graph_edits[item])
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/dgllife/data/uspto.py", line 181, in get_pair_label
    labels[i, j, pair_to_changes[(j, i)]] = 1.
IndexError: index 62 is out of bounds for dimension 1 with size 62

obtain the head 100 rxns in the file sin_map_clean.rxns, it will not report error!

head -n 100 sin_map_clean.rxns > sin100.rxns
python find_reaction_center_eval.py --test-path sin100.rxns    -np 1

Can you provide a reaction that will yield the error? I want to use that for developing the feature you requested.

mufeili avatar Jun 28 '20 13:06 mufeili

add debug mode!

In the debug mode, it will report what rxn raise the error.

run the command

python find_reaction_center_eval.py --test-path sin_map_clean.rxns   -np 1
Evaluation on the test set.
Traceback (most recent call last):
  File "find_reaction_center_eval.py", line 79, in <module>
    main(args)
  File "find_reaction_center_eval.py", line 47, in main
    args, args['top_ks_test'], model, test_loader, args['easy'])
  File "/home/NFS/user/zgong/czq/workflow_retro_deepsyn2/step3dgllifesci/dgl-lifesci/examples/reaction_prediction/rexgen_direct/utils.py", line 456, in reaction_center_final_eval
    for batch_id, batch_data in enumerate(data_loader):
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/dgllife/data/uspto.py", line 509, in __getitem__
    self.atom_pair_labels[item] = get_pair_label(mol, self.graph_edits[item])
  File "/home/zgong/nfs/program/anaconda2/envs/py36dgllifesci/lib/python3.6/site-packages/dgllife/data/uspto.py", line 181, in get_pair_label
    labels[i, j, pair_to_changes[(j, i)]] = 1.
IndexError: index 62 is out of bounds for dimension 1 with size 62

obtain the head 100 rxns in the file sin_map_clean.rxns, it will not report error!

head -n 100 sin_map_clean.rxns > sin100.rxns
python find_reaction_center_eval.py --test-path sin100.rxns    -np 1

This shall be addressed in PR #38 .

mufeili avatar Jun 30 '20 18:06 mufeili

Just tried and I think the issue no longer exists with the master branch.

On Tue, Aug 25, 2020 at 12:03 PM summer-cola [email protected] wrote:

add debug mode!

run the command python classification_train.py -c XXX.csv -sc SMILES -t XXX -mo MPNN problems:

Traceback (most recent call last):

File "classification_train.py", line 218, in

main(args, exp_config, train_set, val_set, test_set)

File "classification_train.py", line 93, in main

run_a_train_epoch(args, epoch, model, train_loader, loss_criterion, optimizer)

File "classification_train.py", line 33, in run_a_train_epoch

logits = predict(args, model, bg)

File "/home/yuanyuan/dgl-lifesci/examples/property_prediction/csv_data_configuration/utils.py", line 329, in predict

edge_feats = bg.edata.pop('e').to(args['device'])

File "/home/yuanyuan/soft/anaconda3/lib/python3.7/_collections_abc.py", line 795, in pop

value = self[key]

File "/home/yuanyuan/soft/anaconda3/lib/python3.7/site-packages/dgl/view.py", line 128, in getitem

return self._graph.get_e_repr(self._edges)[key]

KeyError: 'e'

when predicting molecular properties -mo weave/attentivefp/MPNN ,the problem also exists.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/awslabs/dgl-lifesci/issues/18#issuecomment-679562857, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEVLQXDMGWVGAYURHQTJP4LSCMZ2LANCNFSM4N4CYRWA .

mufeili avatar Aug 25 '20 08:08 mufeili

https://github.com/awslabs/dgl-lifesci/issues/18#issuecomment-679882211 Yes,it is working .Thanks

summer-cola avatar Aug 25 '20 09:08 summer-cola