dvc icon indicating copy to clipboard operation
dvc copied to clipboard

data status returns files as "Not in remote" even though they are marked as push: false in pipeline

Open mermerico opened this issue 1 year ago • 2 comments

Bug Report

Description

dvc data status --not-in-remote is used to determine if some files haven't been pushed up to remote storage. In a CI/CD context it is a useful check before accepting a PR. If a DVC pipeline has a stage with outputs marked push: false, those files will appear as "Not in remote". This makes it harder to detect when files that should be pushed to remote have not been (especially in an automated manner).

Reproduce

  1. Create a dvc pipeline with a stage output marked push: false
  2. dvc repro
  3. dvc push
  4. dvc data status --not-in-remote

Expected

No files should be marked as "Not in remote" OR an option should be provided to suppress those files.

Environment information

N/A

Output of dvc doctor:

DVC version: 3.43.1 (pip)
-------------------------
Platform: Python 3.11.6 on Linux-5.15.0-92-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 3.9.0
        dvc_objects = 3.0.6
        dvc_render = 1.0.1
        dvc_task = 0.3.0
        scmrepo = 2.1.1
Supports:
        http (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.5, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.9.0, boto3 = 1.28.17)
Config:
        Global: /home/mermerico/.config/dvc
        System: /etc/xdg/dvc
Cache types: symlink
Cache directory: ext4 on /dev/md0p1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/md0p1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/97a24c320f9c7207aea41ca9a4dc4061

mermerico avatar Feb 23 '24 20:02 mermerico

This affects us as well. Has there been any progress on this?

Northo avatar May 21 '25 11:05 Northo

I've started drafting a PR for this. Will update when/if I have the time.

Update: #10749 is now ready for review.

Northo avatar May 22 '25 11:05 Northo