ATM icon indicating copy to clipboard operation
ATM copied to clipboard

Avoid creating redundant datasets

Open bcyphers opened this issue 7 years ago • 0 comments

If enter_data() is called with the same train_path twice in a row and the data itself hasn't changed, a new Dataset does not need to be created.

We should add a column which stores some kind of hash of the actual data. When a Dataset would be created, if the metadata and data hash are exactly the same as an existing Dataset, nothing should be added to the ModelHub database and the existing Dataset should be returned instead.

bcyphers avatar Jan 31 '18 05:01 bcyphers