ATM
ATM copied to clipboard
Avoid creating redundant datasets
If enter_data()
is called with the same train_path
twice in a row and the data itself hasn't changed, a new Dataset does not need to be created.
We should add a column which stores some kind of hash of the actual data. When a Dataset would be created, if the metadata and data hash are exactly the same as an existing Dataset, nothing should be added to the ModelHub database and the existing Dataset should be returned instead.