Enable pytest snapshots with python
It's hard to use pytest snapshots using pants because tests are executed in a temporary context. We can think of using a "file" target along with the "python_tests" target for existing snapshots, but we currently can't write new files that way.
My repo tree is like that:
| - pants
| - packages-python-pants
| | - package
| | - tests
| test0.py
| | - snapshots
| | - test0
| test0.0.json
| test0.1.json
One workaround I found (I'm using pytest-snapshot) is to add PWD to the test context:
[test]
extra_env_vars = ["PWD"]
And I created a small util function that creates the absolute path to the snapshot directory to define it explicitly, that I call in the test.
def get_snapshot_directory(file, name):
return os.path.join(
os.environ["PWD"],
re.search("packages-python-pants(.*)", os.path.dirname(file)).group(),
"snapshots",
name,
)
def test_dummy(snapshot):
dummy_object = {"a": 1, "b": "c"}
snapshot.snapshot_dir = get_snapshot_directory(__file__, __name__)
snapshot.assert_match(
json.dumps(dummy_object, indent=3), "test_dummy.json",
)
That enables to write, update and read snapshots in the repo as shown above.
While this hack works, the ideal solution would be to not specify any snapshot directory in tests and have them put by default at the locations I had to explicitly write, which is what would occur without pants.
It's likely that the test goal could explicitly build in support for this by capturing those inputs, and adding an optional Digest to the TestResult type which was committed after a successful test. Relates to #12014, but probably still worth implementing natively for test, since it almost ceases to be a "secondary" effect, and starts to be one of the primary reasons you've run the test.
I've attempted this and found that that either newer pants (2.12) or macOS or something else has stopped the proposed workaround from working directly: when setting extra_env_vars = ["PWD"] by itself, PWD comes through as the temporary directory pants is using (e.g. os.environ['PWD'] => '/private/tmp/.tmpfPyxTZ').
I could get it to work if I also set the following:
[subprocess-environment]
env_vars.add = ["PWD"]
But it seems even worse to be affecting all subprocess invocations rather than just the test ones.
(I also tried extra_env_vars = ["ORIGINAL_PWD=$PWD"] but that didn't work as intended.)
This comment has two sections (thanks for bearing with my verbosity):
- a workaround that is working for updating snapshots for us
- a discussion of some experience I have with snapshot testing that may apply to this issue/snapshot testing in general (maybe this should become a proper GitHub discussions)
Workaround
We've now got this working well enough to unblock us, with pants 2.14 and https://github.com/tophat/syrupy (requires version >= 3.0.1).
# validate: (weird spacing to make commonalities clear)
./pants test some/target.py another/target::
# update: use the new script and pass --snapshot-update to syrupy
./scripts/pants-with-snapshot-hack.sh test some/target.py another/target:: -- --snapshot-update
We have the following wrapper script for ./pants, which works for our snapshots nested within the backend/ directory:
- copies any snapshots (by looking for
__snapshots__directory) into a fixed temporary directory - runs pants, setting a special environment variable to point to that directory
- copies back the snapshots into the source repo
expand for scripts/pants-with-snapshot-hack.sh
#!/usr/bin/env bash
# FIXME: https://github.com/pantsbuild/pants/issues/11622 see backend/conftest.py
set -euo pipefail
# ensure we operate from the root directory
repo_root=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )/.." &> /dev/null && pwd )
cd "${repo_root}"
# use a 'fixed' path based on the current repo: there's generally only one pants invocation at a
# time in a given directory, and a fixed path gives more chance for cache hits, to avoid rerunning
# all tests every time.
tmpdir="/tmp/pants-snapshot-hack${repo_root}"
# ensure we're starting fresh:
rm -rf "$tmpdir"
mkdir -p "$tmpdir"
copy_snapshots() {
# `copy_snapshots src dst` copies the __snapshots__ subdirectories in src into dst, preserving
# the directory structure. dst's __snapshots__ directories become an exact copy of src: files
# within existing __snapshots__ directories in dst are deleted if they're not in the
# corresponding spot in src.
src="$1"
dst="$2"
(
cd "$src"
find . -name '__snapshots__' -type d
) | rsync --verbose --recursive --archive --delete --files-from=- "$src" "$dst"
}
copy_snapshots backend "$tmpdir/backend"
exit_code=0
PANTS_WITH_SNAPSHOTS_HACK_DIR="$tmpdir" ./pants "$@" || exit_code="$?"
copy_snapshots "$tmpdir/backend" backend
exit "$exit_code"
This syncs with:
pants.toml- a top-level
backend/conftest.pythat overrides syrupy'ssnapshotfixture with a custom extension to use the directory from the environment variable (this needs to be in parent directory of any tests that use snapshots)
expand for pants.toml
...
[test]
extra_env_vars = [
...
# FIXME https://github.com/pantsbuild/pants/issues/11622, see backend/conftest.py and scripts/pants-with-snapshots-hack.sh
"PANTS_WITH_SNAPSHOTS_HACK_DIR",
]
expand for backend/conftest.py
import logging
import os
import pytest
from syrupy.assertion import SnapshotAssertion
from syrupy.extensions.amber import AmberSnapshotExtension
ENV_VAR = "PANTS_WITH_SNAPSHOTS_HACK_DIR"
class EscapePantsSandboxExtension(AmberSnapshotExtension):
"""Point syrupy to the original files outside pants' sandbox
FIXME https://github.com/pantsbuild/pants/issues/11622:
Pants runs tests in a 'sandboxed' temporary directory, so edits to the snapshot file aren't
persistent. This works around via an orchestration script that copies the snapshots to a
temporary directory, editing paths to have syrupy be reading/writing those files, and then
copies back (see scripts/pants-with-snapshots-hack.sh).
"""
@property
def _dirname(self) -> str:
# Changes here should also be applied to any other extensions
# (search for imports/uses of 'syrupy.extensions')
# /tmp/whatever/backend/something/__snapshots__
original = super()._dirname
# duplicated root directory, set from original invocation via pants.toml and scripts/pants-with-snapshots-hack.sh
snapshot_hack_dir = os.environ[ENV_VAR]
# find just the /backend/... part
backend_index = original.index("/backend/")
# mash 'em together to get the path that'll be copied back to the repo after the test run
return f"{snapshot_hack_dir}{original[backend_index:]}"
@pytest.fixture
def snapshot(snapshot: SnapshotAssertion) -> SnapshotAssertion:
if os.environ.get(ENV_VAR) is None:
# pytest without the snapshot hack
return snapshot
# updating snapshots requires looking outside the sandbox
return snapshot.use_extension(EscapePantsSandboxExtension)
There's some details here:
- The temporary directory is fixed (computed from the source directory of the
pantsscript) to be able to benefit from pants' cache: seetmpdir="...". I think this is okay, as long as pants is not run concurrently within a single check-out. - For syrupy, any other extensions also need to make sure they override
_dirnameappropriately. - We haven't tried using it with any other snapshot library (pytest-snapshot or otherwise).
Discussion
Testing 'modes' and caching
IME there's typically two modes for running snapshot tests:
- validating: running the tests and just comparing the computed values against the saved snapshots, failing any test where it doesn't match. This can include doing a file/test-specific diff, e.g. image snapshots like https://github.com/americanexpress/jest-image-snapshot#see-it-in-action (that specific library is JS, but potentially something similar could happen with a Python lib, I just don't know of one off the top of my head... and image snapshots seem to be pretty popular for frontend in particular (i.e JS/TS #14190))
- updating: running the tests and creating/updating/deleting snapshots as appropriate, based on the tests
Validating is what needs to run on CI (and, I think, is what ./pants test :: should do by default). For updating, for sake of discussion, let's say ./pants test --abcdef :: enables that write-back mode (placeholder name to not bias the discussion or create a bike shed about what to call it).
Unfortunately, the updating mode means running tests in a different configuration, e.g. for syrupy, passing the additional --snapshot-update argument. This seems to imply that test runs between the two modes cannot share a cache which means running ./pants test --abcdef :: (successful) followed by ./pants test :: may do two full entire-repo test runs, rather than the second one being fully cached. This is unless pants deeply understands the possible implications of --snapshot-update, but that doesn't seem so good.
File dependencies
One annoyance we've had with managing snapshots is having to specify the snapshots files on disk as dependencies to the tests that use them.
For example, for syrupy, they're in a __snapshots__ directory adjacent to the test_....py file, named similar to it, e.g., for /path/to/test_abc.py, by default, the snapshots will be in /path/to/__snapshots__/test_abc.ambr or /path/to/__snapshots__/test_abc/some_other_file.ext, depending on the snapshot style.
A coarse version of this is easy enough:
files(name="snapshots", sources=["__snapshots__/**/*"])
python_tests(name="tests", dependencies=[":snapshots"])
This works fine for us, but isn't 'perfect':
- it's easy to forget to add the
fileswhen adding new tests and get confusing failures until I remember I need to add that dependency - it's coarse: all test files depend on all snapshots, rather than
test_abc.pydepending on only__snapshots__/test_abc.ambr, andtest_def.pyon only__snapshots__/test_def.ambr(etc.). This seems to result in spurious rebuilds/poorer cache utilisation.
Addressing 2 is... a lot of effort to do manually, and doing so makes 1 worse. It'd be nifty if pants understood the conventions here (best case dynamically via some sort of dependency inference, or at least statically via ./pants tailor)
This relates to (or at least might share a solution with) https://github.com/pantsbuild/pants/issues/17301.
I think being able to run the test via pytest, but exactly like ./pants run ... would solve both cases.
@huonw Thanks for posting that workaround! I was able to adapt it to my project setup that's currently on Pants v2.15.0 and syrupy v4.0.2.
What I needed to change so that it works with syrupy v4
The _dirname property is now a class method that takes a test_location argument.
from syrupy import PyTestLocation
# ...
class EscapePantsSandboxExtension(AmberSnapshotExtension):
# ...
@classmethod
def dirname(cls, *, test_location: PyTestLocation) -> str:
# Changes here should also be applied to any other extensions
# (search for imports/uses of 'syrupy.extensions')
# /tmp/whatever/backend/something/__snapshots__
original = super().dirname(test_location=test_location)
# duplicated root directory, set from original invocation via pants.toml and scripts/pants-with-snapshots-hack.sh
snapshot_hack_dir = os.environ[ENV_VAR]
# find just the /backend/... part
backend_index = original.index("/backend/")
# mash 'em together to get the path that'll be copied back to the repo after the test run
return f"{snapshot_hack_dir}{original[backend_index:]}"
After the merge of #19264 there is a new generate-snapshots goal. Backends insisted on supporting snapshot testing should hook into it to offer end users the possibility of generating the snapshots.
This issue has been open for over one year without activity and is not labeled as a bug. It has been labeled as stale to invite any further updates. If you can confirm whether the issue is still applicable to the latest version of Pants, appears in other contexts, or its priority has changed, please let us know. Please feel free to close this issue if it is no longer relevant.