Streaming dataset construction or appending to an existing dataset
Is your feature request related to a problem? Please describe. I am always frustrated when I need to re-run tfds build when new data samples becomes available. Doing so is time-consuming.
Describe the solution you'd like The ability to append data to an existing tfds dataset
Describe alternatives you've considered I am not sure if there are any alternatives
Additional context Add any other context or screenshots about the feature request here.
Do I understand correctly that you have a non-static data source from which you create a TFDS dataset? The data source regularly has new data appended to it. When new data is appended, you'd like TFDS to generate examples for those and append them to the TFDS dataset?
If so, is there a constant stream of new source data or is there new data on a regular basis, e.g., daily?
One consequence of this is that reading the same TFDS dataset on different days means that you'll read different data, i.e., model training would not be reproducible.