deeplake
deeplake copied to clipboard
[FEATURE] Option to disable auto commit after data ingestion
Description
Currently, versions are made upon data ingestion with the following code: https://github.com/activeloopai/deeplake/blob/2ad84c139e91fe6f81da9c8b6d4f48c9d3ee8e73/deeplake/core/vectorstore/deeplake_vectorstore.py#L294
It seems like every time the commit is made, the full dataset of current state is captured as a corresponding version. So, if the user commits a lot, the storage the versions consumes blows up rapidly.
It becomes problematic if the user ingest small data incrementally, i.e., the dataset between versions are almost the same, so consumes space inefficiently.
The canonical solution for this would be to capture only the diff data for each version, but as I'm not acquainted the codebase, don't know whether it is feasible.
So, instead, I suggest to offer an option that users can choose whether they want "auto-commit" or not when ingest a data.
Use Cases
No response
Hey @HyunggyuJang, thanks a lot for raising the issue. We're already working on this, and I'll be sure to let you know when the updates are released.