signac icon indicating copy to clipboard operation
signac copied to clipboard

Refactor path function handling

Open cbkerr opened this issue 4 years ago • 22 comments

This PR switched from a bug fix to a refactor. See #666 for the bug fix only.

Original Title: Don't generate views with underspecified path provided by user

Signac has unexpected behavior when generating a view if the user specifies a custom path that doesn't uniquely specifiy jobs.

I would expect signac to link to all jobs matching the description in the view folder.

What @Nipuli-Gunaratne and I found was that signac just picks one job.

Description

We check that the mapping of source-> link is 1-1 in other parts of import_export.py but don't check user's provided path function.

Two posssible solutions I see are:

  1. Generate an error and exit. Suggest the fix in the error message.
  2. Try to fix the problem by adding jobs ids to the path.

In this draft PR, I find the places in the code for adding the error checking code in two places it might fit: import_export.py::_make_path_function and in linked_view.py::create_linked_view.

I think it works best in _make_path_function.

Motivation and Context

If you make this test project

#init.py
import signac

project = signac.init_project('Test-view')

jobs = [dict(a=1,b=1),
        dict(a=1,b=2),
        dict(a=2,b=1),
        dict(a=2,b=2)
        ]


for j in jobs:
    job = project.open_job(j)
    job.init()
    print(j, job.id)

and generate a view with a user-specified custom path that does not uniquely identify jobs

signac view test_error "a/{a}"

Signac just picks one of the jobs to link. a=1 gets job 8aacdb17187e6acf2b175d4aa08d7213 (b=2) and not 386b19932c82f3f9749dd6611e846293 (b=1)
a=2 gets job 5e4d14d82c320bafb2f1286fe486d1f8 (b=1) and not d48f81ad571306570e2eb9fe7920cd3c (b=2)

Fix that we'd suggest to users OR try to do automatically:

Remake the path specification as "a/{a}/id/{id}"

Types of Changes

  • [ ] Documentation update
  • [x] Bug fix
  • [ ] New feature
  • [ ] Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • [ ] I have updated the API documentation as part of the package doc-strings.
  • [ ] I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • [ ] I have updated the changelog and added all related issue and pull request numbers for future reference (if applicable). See example below.

cbkerr avatar Nov 03 '21 16:11 cbkerr

Codecov Report

Merging #642 (0cda86d) into master (44f59ee) will decrease coverage by 0.07%. The diff coverage is 75.90%.

:exclamation: Current head 0cda86d differs from pull request most recent head 0837127. Consider uploading reports for the commit 0837127 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master     #642      +/-   ##
==========================================
- Coverage   78.42%   78.34%   -0.08%     
==========================================
  Files          65       65              
  Lines        7119     7090      -29     
  Branches     1564     1435     -129     
==========================================
- Hits         5583     5555      -28     
- Misses       1226     1227       +1     
+ Partials      310      308       -2     
Impacted Files Coverage Δ
signac/contrib/utility.py 58.99% <74.02%> (+7.14%) :arrow_up:
signac/__main__.py 71.89% <100.00%> (ø)
signac/contrib/import_export.py 79.60% <100.00%> (+1.29%) :arrow_up:
signac/contrib/linked_view.py 84.02% <100.00%> (+0.11%) :arrow_up:
signac/contrib/filesystems.py 51.00% <0.00%> (-3.21%) :arrow_down:
signac/common/connection.py 38.27% <0.00%> (-0.76%) :arrow_down:
signac/common/crypt.py 63.26% <0.00%> (-0.74%) :arrow_down:
signac/core/json.py 86.20% <0.00%> (-0.46%) :arrow_down:
signac/common/host.py 40.00% <0.00%> (-0.43%) :arrow_down:
signac/core/jsondict.py 46.99% <0.00%> (-0.29%) :arrow_down:
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 44f59ee...0837127. Read the comment docs.

codecov[bot] avatar Nov 03 '21 16:11 codecov[bot]

I think we should raise an error here generally, and also remove incomplete_links in favor of links

I added the error in import_export.py but as I was trying to use the inverted dictionary for links, I realized that the existing _update_view function needs the link as the key.

I think it will be least disruptive to leave the old code as is in create_linked_view

cbkerr avatar Nov 03 '21 20:11 cbkerr

How shall I test this?

cbkerr avatar Nov 04 '21 18:11 cbkerr

I changed back to RuntimeError mostly so I don't have to change the existing tests that were looking for that error.

Should I add a test that can check for where the error is caught?

cbkerr avatar Nov 18 '21 00:11 cbkerr

It looks like there is also a duplicate check for export: https://github.com/glotzerlab/signac/blob/60646fdf29e5aa28680a6462337ab5a0488bf6c6/signac/contrib/import_export.py#L328-L333

@cbkerr Can you unify these so that we only have one place in views/exports that is used for duplicate checking? The logic should be only implemented once and called in both export/view as a helper function.

bdice avatar Nov 18 '21 14:11 bdice

Should I add a test that can check for where the error is caught?

Yes, we add tests for all error cases that we can catch. You can model it on this test: https://github.com/glotzerlab/signac/blob/4a4bfde3bcdfb31dc370116190092582e0ee7248/tests/test_project.py#L1131-L1138

bdice avatar Nov 18 '21 14:11 bdice

It looks like there is also a duplicate check for export:

https://github.com/glotzerlab/signac/blob/60646fdf29e5aa28680a6462337ab5a0488bf6c6/signac/contrib/import_export.py#L328-L333

@cbkerr Can you unify these so that we only have one place in views/exports that is used for duplicate checking? The logic should be only implemented once and called in both export/view as a helper function.

Yes, I structured my initial check based on this one. However, I don't think we can remove the RunTime error from _export_jobs in case it is passed a callable path function that would result in the error.

Do you mean making a new function _check_path_function(jobs, path_function) kind of like we have for the existing function _check_directory_structure_validity? (and calling this function in both places)

cbkerr avatar Nov 18 '21 16:11 cbkerr

@cbkerr just making sure, this PR is just waiting on you for edits right? Or do you need more feedback from me right now? Feel free to ping me if I'm leaving you hanging.

vyasr avatar Nov 23 '21 18:11 vyasr

I implemented the helper function, but to provide the user with the specified path in the error message, it seems hacky to also pass the "path" argument into the helper function. I do think it would be nice to have that info.

cbkerr avatar Nov 23 '21 19:11 cbkerr

Would appreciate your look again @vyasr. I use your suggestion of the simple check first.

cbkerr avatar Nov 30 '21 22:11 cbkerr

I implemented the helper function, but to provide the user with the specified path in the error message, it seems hacky to also pass the "path" argument into the helper function. I do think it would be nice to have that info.

I don't think it's too hacky, but I see your point. I think it's slightly awkward because we're calling a validation function that raises an exception rather than returning a boolean flag that can be used to raise an exception at the call site. However, since you're reusing this function twice I don't think you want to reproduce the exception in both places, so I don't mind it the way it is. Since you are providing the path spec, though, one thing that might be an improvement on the current error message is to actually show the full path spec that you're recommending as an alternative. Specifically, replace {os.path.join('id', '{job.id}')} with {os.path.join(path_spec, 'id', '{job.id}')} or so. What do you think?

Would appreciate your look again @vyasr. I use your suggestion of the simple check first.

Yup the check looks good to me now.

vyasr avatar Dec 01 '21 19:12 vyasr

I added a test in test_create_linked_view_homogeneous_schema_nested_provide_partial_path and confirmed that without the changes in this PR, the test fails, i.e. the code fails to generate the error.

I'm having trouble adding a test to test_shell even following the same format as the others that are looking for generated error messages. My problem is that the output of err = self.call ... is an empty string. Any pointers on this?

cbkerr avatar Dec 07 '21 20:12 cbkerr

I agree that the import_export code needs refactoring. I'm going to split this PR into one for the bug fix for the release and another for the refactor. @bdice was okay with this in a chat offline

cbkerr avatar Dec 14 '21 21:12 cbkerr

Given the large number of reviews, I am removing myself.

b-butler avatar Jan 03 '22 15:01 b-butler

I'm following @b-butler and removing myself.

lyrivera avatar Jan 04 '22 15:01 lyrivera

@cbkerr what's the plan here?

vyasr avatar Jan 18 '22 20:01 vyasr

I've extracted enough to be a bug fix into #666. It should be easier to review without moving lots of code around. I'll rename this PR or make an issue to focus on the refactor.

cbkerr avatar Jan 21 '22 20:01 cbkerr

@cbkerr awesome. If you want to merge master into this branch and fix conflicts, I would be fine with repurposing this PR for the refactor. The conversation here is illuminating and worth preserving next to the final set of changes IMO.

vyasr avatar Jan 25 '22 06:01 vyasr

@cbkerr do you need help with next steps on this PR?

vyasr avatar Feb 21 '22 20:02 vyasr

I'll get back to this after APS in mid March.

cbkerr avatar Feb 22 '22 14:02 cbkerr

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 12 '22 14:06 stale[bot]

@cbkerr do you still plan on getting back to this?

b-butler avatar Jul 08 '22 15:07 b-butler

@cbkerr checking in here again, do you still plan to tackle this? Since the bug fix was merged in #666 and this is just a refactor, can the changes be retargeted onto next for version 2.0 instead of 1.8?

vyasr avatar Sep 18 '22 16:09 vyasr

@cbkerr do you still think this is work worth doing, and just not something that you'll get around to, or is this something we can skip altogether? If you think it's worthwhile, could you write up an issue describing what should be done in case someone wants to pick this up in the future?

vyasr avatar Sep 19 '22 06:09 vyasr