pinot icon indicating copy to clipboard operation
pinot copied to clipboard

dataset representation

Open yuanqing-wang opened this issue 4 years ago • 3 comments

what features do we want for Dataset object?

now we have the following functionalities

  • [ ] csv import
  • [ ] batch
  • [ ] split
  • [ ] temporal splitting

we should consider adding the following:

  • [ ] extra features (which level)
  • [ ] different node representation input

yuanqing-wang avatar Jul 06 '20 18:07 yuanqing-wang

Based on our discussion earlier, elaborating on the "different node representation input":

  1. Have dataset object annotate different inputs with different style of representing graphs. i.e., different preprocessing steps that yield different representations, like dataset.smiles_representation or some other representation.
  2. Have datasets be typed. Typed according to what input representation it assumes, and then have a flag that can be passed forward to the model (and by extension, the models should be typed).

miretchin avatar Jul 06 '20 18:07 miretchin

Potentially, we might want to do something like this paper which the input has both the graph representation + junction tree representation (I digged in the code and it's possible to process molecules into junction trees either on the fly or as part of preprocessing)

https://arxiv.org/pdf/2006.12179.pdf

dnguyen1196 avatar Jul 06 '20 18:07 dnguyen1196

https://docs.google.com/document/d/1Yp4qZ-9U1kPQI3upwDzJrys8Gjr0F92O_8PVrK-UfqU/edit

design doc @miretchin @karalets @dnguyen1196

yuanqing-wang avatar Jul 16 '20 20:07 yuanqing-wang