Support additional training information

Open javdrher opened this issue 8 years ago • 1 comments

Following the goals in #44 I took a step back on the issue. It got me thinking about the role of the data object, right now I think the fundamental rule should be: the data object is to bring the data (X/Y and anything extra) from the expensive functions to the models.

Right now, I have the following idea in my mind:

If a model expects more than only X/Y, it should inform the data object. As the core models are defined in GPflow we can not add logic to inform us. However we could scan for all dataholders in a model and use their name property to look for an entry in the data object, or we follow the decorator pattern to enable users to implement additional mappings.
If expensive objective functions return additional information they should instantly return a Data object. In case no additional information is returned, objective values can be returned directly, as is the case now. We then combine all these data objects and call set_data which visits all models and performs the updates.

I have some different versions in my mind, we could automatically construct a pipeline with for instance Apache Beam but that would be overkill and it might introduce a lot more coupling between the objects. Also it would make the learning curve to contribute a lot higher. Also I think the logic in the Data object can be straightforward.

Jul 29 '17 12:07 javdrher

I think supporting additional training data needs more thought and is a lot of work. Lets not fix this on a release version yet

Nov 11 '17 18:11 icouckuy