darwin-py
darwin-py copied to clipboard
[DAR-2991][External] Set CLI to pull with folders by default & display overwrite warning
Problem
In DAR-2246, we made significant breaking changes to the way darwin-py names files when downloading them. One of those changes was changing the default behaviour of pull()
so that it replicates remote folder structures locally by default. However, it was overlooked that CLI-initiated pulls had their own default behaviour, and this was not changed in that PR
This, combined with the naming changes themselves, has meant that CLI-initiated dataset pulls will result in overwritten local files if there are identically named files in different folders in the release
Solution
- Change the default behaviour of
pull()
from the CLI to locally replicate remote folder structures - Implemented logic to detect duplicate local file paths if pulling without folders. If duplicates are detected, display a warning
The warning is in the following style:
Warning: Identical filenames detected in your export release.
You are pulling a flat release with identically named dataset items. The release will still be pulled, but to prevent overwriting your dataset files, please re-pull the release with the folder structure. This can be done as follows:
- CLI: darwin dataset pull team_slug/dataset_slug --folders
- SDK: dataset.pull(use_folders=True)
The following paths are duplicated:
- {path_1} is duplicated 2 times
- {path_2} is duplicated 2 times
Changelog
- Dataset releases pulled with the CLI will now locally replicate the remote folder structure by default
- If pulling a dataset release without folders, display a warning if there are any local files that will be overwritten