pytest-regressions icon indicating copy to clipboard operation
pytest-regressions copied to clipboard

FileRegression + LazyDatadir may conflict with user generated files => confusing errors

Open sscherfke opened this issue 5 months ago • 5 comments

Issue occurs in pytest-regressions >= 2.8.0

Scenario:

You have your expected results generated to tests/test_spam/index.json as follows (e.g., from an older version):

{
    "spam": "eggs"
}

The corresponding test in tests/test_spam.py looks like this:

import json
from pathlib import Path

from pytest_regressions.file_regression import FileRegressionFixture


def request_data_from_api(outfile: Path) -> None:
    """
    Fake for actual program logic.
    """
    outfile.write_text('{"spam":"eggs"}')


def test_spam(tmp_path: Path, file_regression: FileRegressionFixture) -> None:
    output_file = tmp_path.joinpath("index.json")
    request_data_from_api(output_file)

    # Nicely format data
    raw_data = json.loads(output_file.read_text())
    result_data = json.dumps(raw_data, indent=4)

    file_regression.check(result_data, basename="index", extension=".json")

What now happens:

  • The test buffers the raw data in {tmp_path}/index.json
  • The test passes the formatted data to check()
  • check() correctly derives tests/test_spam/index.json as the source_filename containing the expected data.
  • LazyDatadir generates filename = {tmp_path}/index.json for the actual comparison.
  • It wants to copy the contents from source_filename to filename, but since this file already exists (it contains the the raw data, see 2nd line of test_spam()), it does not perform a copy.
  • check() goes on and compares the result_data (formatted json) with the contents of filename (raw json) which of course failes.
  • Bonus: If --force-regen is passed, the formatted result_data is written to source_filename, which does not change anything.

Possible solutions:

  • Use a random filename in LazyDatadir so that it becomes very improbable that filename overwrites another the file that users may have created in their tests.
  • Print a warning / raise an error if this problem is detected
  • Tell users that hey should not write to {tmp_path}/{basename}.{extension}

sscherfke avatar Nov 11 '25 11:11 sscherfke

Indeed, we can possibly detect this before writing the file, and fail the test explaining the situation (not a warning).

Would you like to contribute a PR?

nicoddemus avatar Nov 11 '25 12:11 nicoddemus

I can try to provide a fix, but it might take some time. :)

sscherfke avatar Nov 11 '25 12:11 sscherfke

The problem needs to be fixed. Until now, I came up with two possible solutions:

  1. Raise an error – easy to do but hacky and results in unnecessarry errors:
diff --git a/src/pytest_datadir/plugin.py b/src/pytest_datadir/plugin.py
index 86346cd..d3740aa 100644
--- a/src/pytest_datadir/plugin.py
+++ b/src/pytest_datadir/plugin.py
@@ -1,7 +1,7 @@
 import os
 import shutil
 import sys
-from dataclasses import dataclass
+from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Union
 
@@ -71,6 +71,7 @@ class LazyDataDir:
 
     original_datadir: Path
     tmp_path: Path
+    _copied: bool = field(default=False, init=False)
 
     def joinpath(self, other: Union[Path, str]) -> Path:
         """
@@ -84,6 +85,12 @@ class LazyDataDir:
         """
         original = self.original_datadir / other
         target = self.tmp_path / other
+        if not self._copied and target.exists():
+            rel_target = target.relative_to(self.tmp_path)
+            raise RuntimeError(
+                f"{{tmp_path}}/{rel_target} was already created outside of a datadir "
+                "fixture!"
+            )
         if original.exists() and not target.exists():
             if original.is_file():
                 target.parent.mkdir(parents=True, exist_ok=True)
@@ -94,6 +101,7 @@ class LazyDataDir:
                 shutil.copytree(
                     _win32_longpath(str(original)), _win32_longpath(str(target))
                 )
+            object.__setattr__(self, "_copied", True)
         return target
 
     def __truediv__(self, other: Union[Path, str]) -> Path:
  1. Copy files into a unique sub-directory within {tmp_path}. Cleaner solution but may be a breaking change?
diff --git a/src/pytest_datadir/plugin.py b/src/pytest_datadir/plugin.py
index 86346cd..a2eaef0 100644
--- a/src/pytest_datadir/plugin.py
+++ b/src/pytest_datadir/plugin.py
@@ -1,4 +1,5 @@
 import os
+import uuid
 import shutil
 import sys
 from dataclasses import dataclass
@@ -114,7 +115,9 @@ def lazy_datadir(original_datadir: Path, tmp_path: Path) -> LazyDataDir:
         original_datadir: The original data directory.
         tmp_path: Pytest's built-in fixture providing a temporary directory path.
     """
-    return LazyDataDir(original_datadir, tmp_path)
+    temp_path = tmp_path / str(uuid.uuid4())
+    temp_path.mkdir(parents=True, exist_ok=True)
+    return LazyDataDir(original_datadir, temp_path)
 
 
 @pytest.fixture
@@ -132,7 +135,7 @@ def lazy_shared_datadir(request: pytest.FixtureRequest, tmp_path: Path) -> LazyD
         tmp_path: Pytest's built-in fixture providing a temporary directory path.
     """
     original_shared_path = Path(os.path.join(request.fspath.dirname, "data"))
-    temp_path = tmp_path / "data"
+    temp_path = tmp_path / f"data-{uuid.uuid4()}"
     temp_path.mkdir(parents=True, exist_ok=False)
 
     return LazyDataDir(original_shared_path, temp_path)

What do you preffer, @nicoddemus ?

sscherfke avatar Nov 22 '25 20:11 sscherfke

I think the first one would be best, thanks!

nicoddemus avatar Nov 24 '25 11:11 nicoddemus

Oh sorry @sscherfke, my bad, I didn't realize the code of the first solution was for pytest-datadir.

I don't think pytest-datadir has anything to do with this issue. The fix should be within pytest-regressions... not sure exactly what it should be yet though.

nicoddemus avatar Dec 08 '25 13:12 nicoddemus