client
client copied to clipboard
Add ability to remove dataset items
The Dagshub CLI tool allows for the ability to upload datasets.
dagshub upload <repo> data/ data/
However, when we want to remove an item from the dataset, there is no CLI support. Removing the items locally then running the upload cli tool will persist the items in the dataset.
The workaround is to use the DVC CLI tool to update the folder. However, this requires us to update the DVC Configurations if it has not already been done.
From the user perspective, there might be two ways to handle this:
- New subcommand:
dagshub sync <repo> data/ data
- New flag for upload:
dagshub upload <repo> data/ data/ --sync
Additional Thoughts If there is a datasource that already points to the dataset folder, the rows will persist even after the items are deleted. To remove these rows, we would need to recreate the data source. Another solution would be to add a new method to the datasources that would remove the rows.
ds = datasources.get('<repo>, 'images')
ds.wait_until_ready()
ds.remove('/path/to/deleted')
Hey Brian!
From the user perspective, there might be two ways to handle this:
New subcommand: dagshub sync
data/ data New flag for upload: dagshub upload data/ data/ --sync
Both options sound good and useful!
dagshub upload <>
is using the DagsHub backend to upload files to a git / DVC directory. That feature was originally developped to allow easy appending of files to a large DVC directory without having to download and sync it locally. Resyncing / deleting files would make that feature more complete, thank you for the suggestion.
Additional Thoughts If there is a datasource that already points to the dataset folder, the rows will persist even after the items are deleted. To remove these rows, we would need to recreate the data source. Another solution would be to add a new method to the datasources that would remove the rows.
Removing datapoints from a Datasource is also needed, hopefully this will be possible soon.
Added with following PRs: https://github.com/DagsHub/client/pull/425 https://github.com/DagsHub/client/pull/424