pinot
pinot copied to clipboard
dataset representation
what features do we want for Dataset
object?
now we have the following functionalities
- [ ] csv import
- [ ] batch
- [ ] split
- [ ] temporal splitting
we should consider adding the following:
- [ ] extra features (which level)
- [ ] different node representation input
Based on our discussion earlier, elaborating on the "different node representation input":
- Have dataset object annotate different inputs with different style of representing graphs. i.e., different preprocessing steps that yield different representations, like
dataset.smiles_representation
or some other representation. - Have datasets be typed. Typed according to what input representation it assumes, and then have a flag that can be passed forward to the model (and by extension, the models should be typed).
Potentially, we might want to do something like this paper which the input has both the graph representation + junction tree representation (I digged in the code and it's possible to process molecules into junction trees either on the fly or as part of preprocessing)
https://arxiv.org/pdf/2006.12179.pdf
https://docs.google.com/document/d/1Yp4qZ-9U1kPQI3upwDzJrys8Gjr0F92O_8PVrK-UfqU/edit
design doc @miretchin @karalets @dnguyen1196