WireViz icon indicating copy to clipboard operation
WireViz copied to clipboard

Fix Exception Handling in _get_yaml_data_and_path(...)

Open martinrieder opened this issue 1 year ago • 12 comments

Add ValueError and OSError where e.errno is None to Exception Handling of _get_yaml_data_and_path(...)

    # if inp is a long YAML string, Pathlib will raise OSError: [errno.ENAMETOOLONG]
    # (in Windows, it seems ValueError or OSError [errno.EINVAL or None]
    # also might be raised in some cases - depending on the Python version)
    # when trying to expand and resolve it as a path.
    # Catch these errors, but raise any others

master contains #250 from d7d7854bce39881a970b54e3522d9293441c5ce1: Consolidate wireviz.parse() to handle Path, str and Dict as input

https://github.com/wireviz/WireViz/blob/954c4f5f92b1d0531f6af0c318c60786b7906d42/src/wireviz/wireviz.py#L415

release/v0.4.1-rc contains #318 at 6f9007f45d48cc65f3469b362976bb53ac1cd7e0: Use portable OS error codes so program doesn't crash

https://github.com/wireviz/WireViz/blob/ba84c54382b540dfa50dcd5e3592917abdbfb1f8/src/wireviz/wireviz.py#L419

Fixes 80000 character YAML input raising ValueError described in #318

Fixes OSError: _getfinalpathname: path too long for Windows from #391

https://github.com/wireviz/WireViz/pull/246#issuecomment-1206635845

All comments have been addressed inside #251 or are no longer relevant after 251 is applied

Therefore this is a bugfix only.

References:

  • #365
    • #391 #392
    • #344 #346
    • #318
  • #190 #246
  • #250 #251

martinrieder avatar Jun 20 '24 11:06 martinrieder

I can dig a little deeper to find the root cause, because the docs do not mention that errno could be none. This behavior might be specific to Python 3.12, which is not officially supported by WireViz (yet).

        try:
            yaml_path = Path(inp).expanduser().resolve(strict=True)
            # if no FileNotFoundError exception happens, get file contents
            yaml_str = open_file_read(yaml_path).read()
        except (FileNotFoundError, OSError) as e:
            # if inp is a long YAML string, Pathlib will raise OSError: [errno.ENAMETOOLONG]
            # (in Windows, it seems OSError [errno.EINVAL] might be raised in some cases)
            # when trying to expand and resolve it as a path.
            # Catch this error, but raise any others
            from errno import EINVAL, ENAMETOOLONG

            if type(e) is OSError and e.errno not in (EINVAL, ENAMETOOLONG):
                raise e
            # file does not exist; assume inp is a YAML string
            yaml_str = inp
            yaml_path = None

The exception asks for both, FileNotFoundError and OSError. Since one is a subclass of the other, errno should actually map to ENOENT.

PS : I am taking a peek at jaraco/path as possible drop-in replacement, although still proposinc this PR only as a Bugfix!

martinrieder avatar Jun 20 '24 23:06 martinrieder

@RedshiftVelocities, I noticed that a similar issue has been found by @kvid in https://github.com/wireviz/WireViz/pull/318#pullrequestreview-1457016602 a year ago. Could you please check this again?

It seems that behavior changes with different Python versions. The referred one uses ValueError in 3.9, while it is now reported as OSError in 3.12.

martinrieder avatar Jun 21 '24 08:06 martinrieder

@martinrieder wrote:

I noticed that a similar issue has been found by @kvid in #318 (review) a year ago. Could you please check this again?

It seems that behavior changes with different Python versions. The referred one uses ValueError in 3.9, while it is now reported as OSError in 3.12.

I can confirm that I still get ValueError: _getfinalpathname: path too long for Windows when trying WireViz on input YAML with 80000 character dummy string in Python 3.10.4 at Windows 10. It's not a very likely use case, but demonstrates a possible exception when trying to use random strings as a path. This edge case is resolved by including ValueError in the list of exceptions to catch.

Update: Tested again at a different PC with Python 3.9.13 with the current branch of PR #365 (without this PR merged in) with the valid a-80k.yml as input:

WireViz 0.4.1-dev
Input file:   _test\a-80k.yml
Output file:  _test\a-80k.[html|png|svg|tsv]
Traceback (most recent call last):
  File "C:\Repos\WireViz\src\wireviz\wv_cli.py", line 153, in <module>
    wireviz()
  File "C:\Repos\WireViz\.venv\lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "C:\Repos\WireViz\.venv\lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "C:\Repos\WireViz\.venv\lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Repos\WireViz\.venv\lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "C:\Repos\WireViz\src\wireviz\wv_cli.py", line 141, in wireviz
    wv.parse(
  File "C:\Repos\WireViz\src\wireviz\..\wireviz\wireviz.py", line 90, in parse
    yaml_data, yaml_file = _get_yaml_data_and_path(inp)
  File "C:\Repos\WireViz\src\wireviz\..\wireviz\wireviz.py", line 409, in _get_yaml_data_and_path
    yaml_path = Path(inp).expanduser().resolve(strict=True)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\pathlib.py", line 1215, in resolve
    s = self._flavour.resolve(self, strict=strict)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\pathlib.py", line 210, in resolve
    return self._ext_to_normal(_getfinalpathname(s))
ValueError: _getfinalpathname: path too long for Windows

kvid avatar Jun 26 '24 19:06 kvid

@kvid what does errno evaluate to in that case? Which version of Pathlib are you running?

BTW: I do not consider running a huge YAML file to be that unusual. If a library of connectors is contained in the templates section, file size can increase quickly.

martinrieder avatar Jun 26 '24 21:06 martinrieder

what does errno evaluate to in that case?

An OSError exception has errno, but ValueError has not.

Which version of Pathlib are you running?

pathlib is a system library included in (and identified by) the Python 3.10.4 installation in my case. System libraries don't have separate version numbers AFAIK.

BTW: I do not consider running a huge YAML file to be that unusual. If a library of connectors is contained in the templates section, file size can increase quickly.

I agree to that, but my 80000 character test input was only a single long string, and that is a bit unusual. However, I just now added my 80000 character as one of the attributes of a connector in a valid WireViz YAML file, and the same ValueError was raised, so you are probably right it also can happen with a huge template section.

I suggest you include ValueError like this:

        except (FileNotFoundError, OSError, ValueError) as e:

kvid avatar Jun 27 '24 00:06 kvid

I suggest you include ValueError like this:

        except (FileNotFoundError, OSError, ValueError) as e:

Additionally, the if clause needs to be adapted, so that the exception is also raised.

My proposal to add some "band-aid" the fix:

  • If inp contains newline characters, treat it as YAML
  • Else pass it to Pathlib and catch its exceptions
  • Raise any unexpected exceptions
  • Fall back to YAML parsing on FileNotFoundError etc.

What do you think about treating all multi-line strings as YAML?

Besides bugfixing, would printing an error message if it is neither an existing file nor valid YAML provide a more user-friendly behavior instead of simply raising an exception?

martinrieder avatar Jun 27 '24 04:06 martinrieder

I suggest you include ValueError like this:

        except (FileNotFoundError, OSError, ValueError) as e:

Additionally, the if clause needs to be adapted, so that the exception is also raised.

No, the if only needs to raise other OSError exceptions than the identified errno values.

My proposal to add some "band-aid" the fix:

  • If inp contains newline characters, treat it as YAML

As I wrote a year ago in point 4 of https://github.com/wireviz/WireViz/pull/318#pullrequestreview-1457016602 that you referred to above, this is exactly what is already implemented in #251.

Catching more exceptions here is just a quick band-aid to avoid most errors until the more thorough fix in #251.

  • Else pass it to Pathlib and catch its exceptions
  • Raise any unexpected exceptions

No expected exceptions to catch after the #251 changes, so they are all raised normally.

  • Fall back to YAML parsing on FileNotFoundError etc.

No special catching of these exceptions are needed.

What do you think about treating all multi-line strings as YAML?

I agree, and #251 is the first to be merged after releasing the collections of quick-fixes.

Besides bugfixing, would printing an error message if it is neither an existing file nor valid YAML provide a more user-friendly behavior instead of simply raising an exception?

See #207

kvid avatar Jun 27 '24 08:06 kvid

Okay then I will keep things simple and it shall look like this:

try:
    yaml_path = Path(inp).expanduser().resolve(strict=True)
    # if no FileNotFoundError exception happens, get file contents
    yaml_str = open_file_read(yaml_path).read()
except (FileNotFoundError, OSError, ValueError) as e:   # adding ValueError per #318 
    # if inp is a long YAML string, Pathlib will raise OSError: [errno.ENAMETOOLONG]
    # (in Windows, it seems OSError [errno.EINVAL] might be raised in some cases)
    # when trying to expand and resolve it as a path.
    # Catch this error, but raise any others
    from errno import EINVAL, ENAMETOOLONG

    if type(e) is OSError and e.errno not in (EINVAL, ENAMETOOLONG, None): # adding None fixes #391
        raise e
    # file does not exist; assume inp is a YAML string
    yaml_str = inp
    yaml_path = None

martinrieder avatar Jun 27 '24 09:06 martinrieder

@formatc1702 seems to prefer not referring to issues or PRs in the source code - see https://github.com/wireviz/WireViz/pull/264#discussion_r1319052618

Referring to them in the body of commit messages is OK and often useful, though.

I therefore suggest removing the two trailing comments including issue references. Instead I suggest expanding the comment block to explain the latest additions like this:

    # if inp is a long YAML string, Pathlib will raise OSError: [errno.ENAMETOOLONG]
    # (in Windows, it seems ValueError or OSError [errno.EINVAL or None]
    # also might be raised in some cases - depending on the Python version)
    # when trying to expand and resolve it as a path.
    # Catch these errors, but raise any others

kvid avatar Jun 28 '24 10:06 kvid

Thanks. I will update the PR accordingly.

I discovered an additional OS-specific bug with expanduser() that arises when the file name contains a tilde ~ character. Should I include a fix for this as well, even though it is a separate issue?

martinrieder avatar Jun 28 '24 15:06 martinrieder

@martinrieder wrote:

Thanks. I will update the PR accordingly.

I discovered an additional OS-specific bug with expanduser() that arises when the file name contains a tilde ~ character. Should I include a fix for this as well, even though it is a separate issue?

If the fix is similar to what is already discussed (catching an extra Exception class or OSError.errno value), then I see no harm in including also that fix in this PR - just update the PR title and description to cover all sub-issues.

Please use the imperative mood in your commit subject lines as recommended in the contribution guidelines and I think the PR title should follow the same recommendation as for commit subject lines, because a squash commit will normally use the PR title as subject line.

kvid avatar Jun 28 '24 18:06 kvid

I'm sorry for being unclear. When I wrote "update the PR title and description to cover all sub-issues", I didn't mean all details should be in the title. Please read all the commit recommendations at the page I linked to above. The subject line should be a short summary, preferably limited to 50 characters and never more than 72. Use the body for details.

kvid avatar Jun 28 '24 21:06 kvid

Thanks for digging into these more obscure issues despite my limited availability. It seems like a very minor code change with multiple positive effects, good job :) I assume it's ready for merge but have requested review just to be sure.

17o2 avatar Jul 01 '24 17:07 17o2

Sorry, I did not get to commit the discussed changes over the weekend. I am doing that now and will keep you updated.

martinrieder avatar Jul 01 '24 18:07 martinrieder

@martinrieder wrote:

[...] I discovered an additional OS-specific bug with expanduser() that arises when the file name contains a tilde ~ character. Should I include a fix for this as well, even though it is a separate issue?

Didn't you include a fix for this? I can't see any more code additions than what we discussed before you mentioned this last issue.

Or does the fix involve huge code changes that is clearly out of scope for this PR?

kvid avatar Jul 01 '24 19:07 kvid

No, I am working on another machine today and had some trouble trying reproduce the issue here. I even reinstalled Python and noticed that the installer asked me to remove MAX_PATH limit. The following link might give a hint on the issue you found in #318, which indicates that it is also depending on System configuration: https://docs.python.org/3.11/using/windows.html#removing-the-max-path-limitation

When a file name starts with ~, the path gets expanded to C:\Users\. I am looking into finding a simple fix for this at the moment:

C:\Users\Martin\Git\WireViz\test>wireviz "~demo01.yml"

WireViz 0.4.1-dev
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Martin\AppData\Local\Programs\Python\Python312\Scripts\wireviz.exe\__main__.py", line 7, in <module>
  File "C:\Users\Martin\AppData\Local\Programs\Python\Python312\Lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Martin\AppData\Local\Programs\Python\Python312\Lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\Martin\AppData\Local\Programs\Python\Python312\Lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Martin\AppData\Local\Programs\Python\Python312\Lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Martin\Git\WireViz\src\wireviz\wv_cli.py", line 122, in wireviz
    raise Exception(f"File does not exist:\n{file}")
Exception: File does not exist:
C:\Users\demo01.yml

martinrieder avatar Jul 01 '24 20:07 martinrieder

@kvid as you can see from the output above, the exception is raised in wv_cli.py. It is unrelated to the issues discussed in this PR. Feel free to review and merge.

Regarding this other issue, I was able to trace it back to the argument handling of the Click library, which passes file to the wireviz() function:

@click.argument("file", nargs=-1)

Calling wireviz ".\~demo01.yml" evaluates to '.\\~demo01.yml', while wireviz "~demo01.yml" evaluates to 'C:\\Users\\demo01.yml'

martinrieder avatar Jul 01 '24 20:07 martinrieder

At Linux it's normal to evaluate a path with a leading ~Martin to the home folder of a user named Martin, and it seems Python tries to make this work in a similar fashion at Windows. Se also https://docs.python.org/3/library/os.path.html#os.path.expanduser and https://docs.python.org/3/library/pathlib.html#pathlib.Path.expanduser

If I understand your description correctly, it seems to work as designed. You get an exception because there is no user named demo01.yml at your PC, and hence no home folder named "C:\\Users\\demo01.yml".

Note also that if the leading ~ isn't followed by something that could be a username, the leading ~ is evaluated to the home folder of the current user.

kvid avatar Jul 01 '24 23:07 kvid

@kvid thanks for explaining. It is of course unlikely for users to choose file names like this. Many Windows apps, especially MS Office create these as temporary files. They would typically not carry a .yaml file extension though. I do not intend to create a bug fix for this.

I would still like to criticize the fact that file names are evaluated in two different parts of the code. This could be the source of many more bugs like the one I discovered. Once there is the chance to do more than just bugfixing, this should be reworked.

Anyways, I am not planning to add more commits to this PR. I tested it locally (on Windows) and if the automated tests are also passing (on Linux) then it should be okay to merge this.

martinrieder avatar Jul 02 '24 17:07 martinrieder