signac
signac copied to clipboard
Refactor path function handling
This PR switched from a bug fix to a refactor. See #666 for the bug fix only.
Original Title: Don't generate views with underspecified path provided by user
Signac has unexpected behavior when generating a view if the user specifies a custom path that doesn't uniquely specifiy jobs.
I would expect signac to link to all jobs matching the description in the view folder.
What @Nipuli-Gunaratne and I found was that signac just picks one job.
Description
We check that the mapping of source-> link is 1-1 in other parts of import_export.py but don't check user's provided path function.
Two posssible solutions I see are:
- Generate an error and exit. Suggest the fix in the error message.
- Try to fix the problem by adding jobs ids to the path.
In this draft PR, I find the places in the code for adding the error checking code in two places it might fit: import_export.py::_make_path_function and in linked_view.py::create_linked_view.
I think it works best in _make_path_function.
Motivation and Context
If you make this test project
#init.py
import signac
project = signac.init_project('Test-view')
jobs = [dict(a=1,b=1),
dict(a=1,b=2),
dict(a=2,b=1),
dict(a=2,b=2)
]
for j in jobs:
job = project.open_job(j)
job.init()
print(j, job.id)
and generate a view with a user-specified custom path that does not uniquely identify jobs
signac view test_error "a/{a}"
Signac just picks one of the jobs to link.
a=1 gets job 8aacdb17187e6acf2b175d4aa08d7213 (b=2) and not 386b19932c82f3f9749dd6611e846293 (b=1)
a=2 gets job 5e4d14d82c320bafb2f1286fe486d1f8 (b=1) and not d48f81ad571306570e2eb9fe7920cd3c (b=2)
Fix that we'd suggest to users OR try to do automatically:
Remake the path specification as "a/{a}/id/{id}"
Types of Changes
- [ ] Documentation update
- [x] Bug fix
- [ ] New feature
- [ ] Breaking change1
1The change breaks (or has the potential to break) existing functionality.
Checklist:
- [x] I am familiar with the Contributing Guidelines.
- [x] I agree with the terms of the Contributor Agreement.
- [x] My name is on the list of contributors.
- [ ] My code follows the code style guideline of this project.
- [ ] The changes introduced by this pull request are covered by existing or newly introduced tests.
If necessary:
- [ ] I have updated the API documentation as part of the package doc-strings.
- [ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
- [ ] I have updated the changelog and added all related issue and pull request numbers for future reference (if applicable). See example below.
Codecov Report
Merging #642 (0cda86d) into master (44f59ee) will decrease coverage by
0.07%. The diff coverage is75.90%.
:exclamation: Current head 0cda86d differs from pull request most recent head 0837127. Consider uploading reports for the commit 0837127 to get more accurate results
@@ Coverage Diff @@
## master #642 +/- ##
==========================================
- Coverage 78.42% 78.34% -0.08%
==========================================
Files 65 65
Lines 7119 7090 -29
Branches 1564 1435 -129
==========================================
- Hits 5583 5555 -28
- Misses 1226 1227 +1
+ Partials 310 308 -2
| Impacted Files | Coverage Δ | |
|---|---|---|
| signac/contrib/utility.py | 58.99% <74.02%> (+7.14%) |
:arrow_up: |
| signac/__main__.py | 71.89% <100.00%> (ø) |
|
| signac/contrib/import_export.py | 79.60% <100.00%> (+1.29%) |
:arrow_up: |
| signac/contrib/linked_view.py | 84.02% <100.00%> (+0.11%) |
:arrow_up: |
| signac/contrib/filesystems.py | 51.00% <0.00%> (-3.21%) |
:arrow_down: |
| signac/common/connection.py | 38.27% <0.00%> (-0.76%) |
:arrow_down: |
| signac/common/crypt.py | 63.26% <0.00%> (-0.74%) |
:arrow_down: |
| signac/core/json.py | 86.20% <0.00%> (-0.46%) |
:arrow_down: |
| signac/common/host.py | 40.00% <0.00%> (-0.43%) |
:arrow_down: |
| signac/core/jsondict.py | 46.99% <0.00%> (-0.29%) |
:arrow_down: |
| ... and 13 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 44f59ee...0837127. Read the comment docs.
I think we should raise an error here generally, and also remove
incomplete_linksin favor oflinks
I added the error in import_export.py but as I was trying to use the inverted dictionary for links, I realized that the existing _update_view function needs the link as the key.
I think it will be least disruptive to leave the old code as is in create_linked_view
How shall I test this?
I changed back to RuntimeError mostly so I don't have to change the existing tests that were looking for that error.
Should I add a test that can check for where the error is caught?
It looks like there is also a duplicate check for export: https://github.com/glotzerlab/signac/blob/60646fdf29e5aa28680a6462337ab5a0488bf6c6/signac/contrib/import_export.py#L328-L333
@cbkerr Can you unify these so that we only have one place in views/exports that is used for duplicate checking? The logic should be only implemented once and called in both export/view as a helper function.
Should I add a test that can check for where the error is caught?
Yes, we add tests for all error cases that we can catch. You can model it on this test: https://github.com/glotzerlab/signac/blob/4a4bfde3bcdfb31dc370116190092582e0ee7248/tests/test_project.py#L1131-L1138
It looks like there is also a duplicate check for export:
https://github.com/glotzerlab/signac/blob/60646fdf29e5aa28680a6462337ab5a0488bf6c6/signac/contrib/import_export.py#L328-L333
@cbkerr Can you unify these so that we only have one place in views/exports that is used for duplicate checking? The logic should be only implemented once and called in both export/view as a helper function.
Yes, I structured my initial check based on this one. However, I don't think we can remove the RunTime error from _export_jobs in case it is passed a callable path function that would result in the error.
Do you mean making a new function _check_path_function(jobs, path_function) kind of like we have for the existing function _check_directory_structure_validity? (and calling this function in both places)
@cbkerr just making sure, this PR is just waiting on you for edits right? Or do you need more feedback from me right now? Feel free to ping me if I'm leaving you hanging.
I implemented the helper function, but to provide the user with the specified path in the error message, it seems hacky to also pass the "path" argument into the helper function. I do think it would be nice to have that info.
Would appreciate your look again @vyasr. I use your suggestion of the simple check first.
I implemented the helper function, but to provide the user with the specified path in the error message, it seems hacky to also pass the "path" argument into the helper function. I do think it would be nice to have that info.
I don't think it's too hacky, but I see your point. I think it's slightly awkward because we're calling a validation function that raises an exception rather than returning a boolean flag that can be used to raise an exception at the call site. However, since you're reusing this function twice I don't think you want to reproduce the exception in both places, so I don't mind it the way it is. Since you are providing the path spec, though, one thing that might be an improvement on the current error message is to actually show the full path spec that you're recommending as an alternative. Specifically, replace {os.path.join('id', '{job.id}')} with {os.path.join(path_spec, 'id', '{job.id}')} or so. What do you think?
Would appreciate your look again @vyasr. I use your suggestion of the simple check first.
Yup the check looks good to me now.
I added a test in test_create_linked_view_homogeneous_schema_nested_provide_partial_path and confirmed that without the changes in this PR, the test fails, i.e. the code fails to generate the error.
I'm having trouble adding a test to test_shell even following the same format as the others that are looking for generated error messages.
My problem is that the output of err = self.call ... is an empty string. Any pointers on this?
I agree that the import_export code needs refactoring. I'm going to split this PR into one for the bug fix for the release and another for the refactor. @bdice was okay with this in a chat offline
Given the large number of reviews, I am removing myself.
I'm following @b-butler and removing myself.
@cbkerr what's the plan here?
I've extracted enough to be a bug fix into #666. It should be easier to review without moving lots of code around. I'll rename this PR or make an issue to focus on the refactor.
@cbkerr awesome. If you want to merge master into this branch and fix conflicts, I would be fine with repurposing this PR for the refactor. The conversation here is illuminating and worth preserving next to the final set of changes IMO.
@cbkerr do you need help with next steps on this PR?
I'll get back to this after APS in mid March.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@cbkerr do you still plan on getting back to this?
@cbkerr checking in here again, do you still plan to tackle this? Since the bug fix was merged in #666 and this is just a refactor, can the changes be retargeted onto next for version 2.0 instead of 1.8?
@cbkerr do you still think this is work worth doing, and just not something that you'll get around to, or is this something we can skip altogether? If you think it's worthwhile, could you write up an issue describing what should be done in case someone wants to pick this up in the future?