client icon indicating copy to clipboard operation
client copied to clipboard

Add ability to remove dataset items

Open BSharp94 opened this issue 6 months ago • 2 comments

The Dagshub CLI tool allows for the ability to upload datasets.

dagshub upload <repo> data/ data/

However, when we want to remove an item from the dataset, there is no CLI support. Removing the items locally then running the upload cli tool will persist the items in the dataset.

The workaround is to use the DVC CLI tool to update the folder. However, this requires us to update the DVC Configurations if it has not already been done.

From the user perspective, there might be two ways to handle this:

  1. New subcommand: dagshub sync <repo> data/ data
  2. New flag for upload: dagshub upload <repo> data/ data/ --sync

Additional Thoughts If there is a datasource that already points to the dataset folder, the rows will persist even after the items are deleted. To remove these rows, we would need to recreate the data source. Another solution would be to add a new method to the datasources that would remove the rows.

ds = datasources.get('<repo>, 'images')
ds.wait_until_ready()
ds.remove('/path/to/deleted')

BSharp94 avatar Dec 19 '23 17:12 BSharp94

Hey Brian!

From the user perspective, there might be two ways to handle this:

New subcommand: dagshub sync data/ data New flag for upload: dagshub upload data/ data/ --sync

Both options sound good and useful!

dagshub upload <> is using the DagsHub backend to upload files to a git / DVC directory. That feature was originally developped to allow easy appending of files to a large DVC directory without having to download and sync it locally. Resyncing / deleting files would make that feature more complete, thank you for the suggestion.

Additional Thoughts If there is a datasource that already points to the dataset folder, the rows will persist even after the items are deleted. To remove these rows, we would need to recreate the data source. Another solution would be to add a new method to the datasources that would remove the rows.

Removing datapoints from a Datasource is also needed, hopefully this will be possible soon.

simonlsk avatar Dec 20 '23 09:12 simonlsk

Added with following PRs: https://github.com/DagsHub/client/pull/425 https://github.com/DagsHub/client/pull/424

kbolashev avatar Apr 16 '24 14:04 kbolashev