rxn-ebm Cannot run make_graphfeat.sh script

Hi @linminhtoo, thank you for the work! I'd like to reproduce the results of the paper thanks to the README file. I'm trying to generate graph features (as they are not in the google drive, contrary to your statement "We again provide them in our Drive") but I cannot execute bash scripts/retrosim/make_graphfeat.sh as it raises the following exception:

File "trainEBM.py", line 477, in main
    raise ValueError(f"Model {args.model_name} not supported!")

I've looked at the history of the sh files and the trainEBM.py file and I guess it's simply a problem in trainEBM.py not properly dealing with the case where model_name = None?

Nov 29 '22 18:11 VincentBt

hello @VincentBt , sorry for my late reply. i've since graduated and am working full-time, so I've not been checking these repos as regularly. please feel free to message me on LinkedIn if my replies are slow.

Yes, you are right - we decided not to upload the graph feats anymore (we used to) because they take up too much space and it's easier to just generate them from scratch. I've made a PR to remove that incorrect statement in the README.

As for the generation itself, you're also right, the bash script has incorrect arguments, somehow (it definitely was working before, hahaha...).

It's been a long time since I last ran it, but the idea is the PyTorch Dataset class we've defined will always attempt to precompute (or load precomputed files from disk) whenever it's initialised, see the entire class here: https://github.com/coleygroup/rxn-ebm/blob/1919eeccdd31e16ec7a44478b756bcd974c35a3c/rxnebm/data/dataset.py#L106-L107 (this line calls the precompute function) https://github.com/coleygroup/rxn-ebm/blob/1919eeccdd31e16ec7a44478b756bcd974c35a3c/rxnebm/data/dataset.py#L164

now, i admit it's a convoluted way of doing it (back when i was still young in college...), but the idea is to run trainEBM.py such that we reach the part where the Dataset gets initialised, which then triggers the graph feat precompute function. this should really be a separate script of its own, which I might get to refactoring some day haha

here, you can see that we will first look for precomputed files, and if they don't exist at the expected paths, then we will proceed with the precomputation: https://github.com/coleygroup/rxn-ebm/blob/1919eeccdd31e16ec7a44478b756bcd974c35a3c/rxnebm/data/dataset.py#L186-L193

if we really only want to make the graphfeats, then we could set the training epochs to 0 so that no training happens. for the model name, you could provide --model_name "GraphEBM_1MPN" and provide the correct argument --representation "graph". alternatively, this also means that if you attempt to run an actual graphEBM training, the code should do the graphfeat precomputation needed to make the training happen. just make sure to give the correct paths to store the graphfeats so you won't have multiple copies of those massive files on your storage (or HPC cluster)

Dec 15 '22 02:12 linminhtoo

Hello @VincentBt , I wanted to check in if you're still facing any other issues with using our work? :)

Feb 07 '23 06:02 linminhtoo