ORDerly issues

keep track of df index

retain df index throughout the cleaning script, so it can be used to create invariant test set (particularly useful to have an invariant test set between the two rxn str...

dswigh

Only test remove_inconsistent_yields once

Tests take a long time, and one reason is that we've got remove_inconsistent_yields as a bool that we vary with two other bools a lot of times. This means we...

dswigh

There is a list of catalysts in Therapeutics Data Commons (TDC) that could potentially be used to identify catalysts (https://tdcommons.ai/multi_pred_tasks/catalyst/), in a similar way to how we identify solvents. However,...

dswigh

Leading spaces - not a problem

There are only 60 molecular identifiers in the entire dataset, so probably not worth applying .strip() to millions of molecules in attempt to save these 60 names - it's not...

dswigh

wontfix

Add test case with ice or ice water

To test this part of the extractor code: for identifier in identifiers: if identifier.value.lower() in ['ice', 'ice water']: ice_present = True

dswigh

Missing yields should always be NaN (never None)

1

I have seen instances where yield_4 is None, while the other yields were NaN. I suspect this happens when there are no numbers in the entire column, the type is...

dswigh

Expand testing

1. find a case for test_rxn_input_extractor and handle_rxn_obj that contains ice 2. Check that solvents csv is well formed 3. Test all 4 options for handling unresolvable names (TFF, FTF,...

dswigh

agents less solvents

agent is used in two cases, we should further specify this what the agent is to reduce ambiguity.

Joearrowsmith

enhancement

add test to check the defaults molecule replacement list values are canonicale

Joearrowsmith

enhancement

Combine ions to salt

1

Combining ions to their respective salts will mean that there will be fewer components per reaction, which may be helpful for ML model prediction. NB: "[Na+].[OH-]" becomes "[Na]O" because non-organic...

dswigh

enhancement

wontfix

ORDerly
ORDerly copied to clipboard

Metadata

keep track of df index

Only test remove_inconsistent_yields once

Identification of catalysts

Leading spaces - not a problem

Add test case with ice or ice water

Missing yields should always be NaN (never None)

Expand testing

agents less solvents

add test to check the defaults molecule replacement list values are canonicale

Combine ions to salt

← Metadata

Owner

Metadata

ORDerly ORDerly copied to clipboard

Metadata

← Metadata

Owner

Metadata

ORDerly
ORDerly copied to clipboard