ORDerly
ORDerly copied to clipboard
Chemical reaction data & benchmarks. Extraction and cleaning of data from Open Reaction Database (ORD)
retain df index throughout the cleaning script, so it can be used to create invariant test set (particularly useful to have an invariant test set between the two rxn str...
Tests take a long time, and one reason is that we've got remove_inconsistent_yields as a bool that we vary with two other bools a lot of times. This means we...
There is a list of catalysts in Therapeutics Data Commons (TDC) that could potentially be used to identify catalysts (https://tdcommons.ai/multi_pred_tasks/catalyst/), in a similar way to how we identify solvents. However,...
There are only 60 molecular identifiers in the entire dataset, so probably not worth applying .strip() to millions of molecules in attempt to save these 60 names - it's not...
To test this part of the extractor code: for identifier in identifiers: if identifier.value.lower() in ['ice', 'ice water']: ice_present = True
I have seen instances where yield_4 is None, while the other yields were NaN. I suspect this happens when there are no numbers in the entire column, the type is...
1. find a case for test_rxn_input_extractor and handle_rxn_obj that contains ice 2. Check that solvents csv is well formed 3. Test all 4 options for handling unresolvable names (TFF, FTF,...
agent is used in two cases, we should further specify this what the agent is to reduce ambiguity.
Combining ions to their respective salts will mean that there will be fewer components per reaction, which may be helpful for ML model prediction. NB: "[Na+].[OH-]" becomes "[Na]O" because non-organic...