datumaro icon indicating copy to clipboard operation
datumaro copied to clipboard

[WIP] Remote repositories for versioning

Open zhiltsov-max opened this issue 4 years ago • 10 comments

Summary

Related to #130, #131

Key changes:

  • Added integration with Git and DVC for versioning.
  • Added support for remote repository for dataset configuration.
  • Added support of remote data sources for project datasets (with DVC. Available HTTP, s3, Git, DVC).
  • A project now consists of a number of data sources - local or remote ones.
  • Removed support of project's own datasets.

CLI changes:

  • Modifying operations on project data (transform, filter, export) are being recorded now. They can be reproduced after with build command.
  • Updated config file structure. Old projects can be read, but they will be saved with a new version.
  • Updated installation: to install Datumaro with Git and DVC support, add [VCS] suffix: pip install <url>[VCS]
  • Added a number of versioning commands in CLI: tag, pull, push, checkout, commit
  • Added remote CLI context to interact with data remotes of a project
  • Added repo CLI context to interact with bound Git repositories of a project

Library changes:

  • Project class has been significantly changed, however, most of the code should work with minimal, or no changes.
  • Projects without binding to a local disk are considered detached. In this mode a Project can only interact with locally available data (no remotes) - mostly, exactly the way it was working prior changes. No versioning capabilities is available in this mode.

How to test

Checklist

  • [x] I submit my changes into the develop branch
  • [ ] I have added description of my changes into CHANGELOG
  • [ ] I have updated the documentation accordingly
  • [x] I have added tests to cover my changes
  • [ ] I have linked related issues)

License

  • [x] I submit my code changes under the same MIT License that covers the project. Feel free to contact the maintainers if that's a concern.
  • [x] I have updated the license header for each file (see an example below)
# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT

zhiltsov-max avatar Sep 25 '20 19:09 zhiltsov-max

@nmanovic, implemented:

datum create

# addition (url format here: https://dvc.org/doc/command-reference/import-url)
# with auto remotes:
datum add path/ -f image_dir
datum add path/to.json -f coco_instances

# with manual remotes:
datum remote add s3://net.loc -n r1
datum source add remote://r1/path/to.xml  -f cvat

datum filter (not checked)/transform # copying variant
datum export
datum build

datum commit

datum source *
datum remote *

zhiltsov-max avatar Oct 23 '20 15:10 zhiltsov-max

@zhiltsov-max , could you please resolve conflicts? Are you going to move on GitHub Actions (ask Anastasia to help)?

nmanovic avatar Jan 22 '21 12:01 nmanovic

Where can I find installation variants in documentation? For example, pip -e .[vcs]?

nmanovic avatar Jan 22 '21 12:01 nmanovic

TODOs:

  • [ ] datum convert with a project source (another PR)
  • [ ] Pretty, useful and reliable output of datum status
  • [ ] Documentation, installation info
    • [ ] A section about using DVC and Git directly
    • [ ] A section about internal implementation and project structure
  • [x] datum model update
  • [ ] datum (e)diff with 2 revisions (another PR)
  • [ ] datum merge with 2 revisions (another PR)
  • [x] cli tests

Pushing to sources is not in scope of this patch.

zhiltsov-max avatar Apr 01 '21 14:04 zhiltsov-max

@zhiltsov-max , should we close the PR?

nmanovic avatar Jun 25 '21 09:06 nmanovic

It will be continued after the first one as "remote sources support".

zhiltsov-max avatar Jun 25 '21 09:06 zhiltsov-max

What happened to this PR? Versioning sounds like a very good idea for dataset management

leeyh20 avatar Sep 17 '21 02:09 leeyh20

@leeyh20, it is split into 2 parts - this one with remotes and #238 with local commands.

zhiltsov-max avatar Sep 17 '21 04:09 zhiltsov-max

Any update on this?

JaviFuentes94 avatar Nov 29 '21 14:11 JaviFuentes94

@JaviFuentes94, not yet - currently, we have no resources for this task. We are welcome for ideas and suggestions on this functionality, though. Could you describe your use cases?

zhiltsov-max avatar Nov 30 '21 10:11 zhiltsov-max