flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

Remove use TypeVar to denote the extension of a Flytefile #2870

Open jerempy opened this issue 3 years ago • 5 comments
trafficstars

TL;DR

Do not use TypeVar to denote the extension of a Flytefile. Pylint was breaking in a couple of places. This fixes it.

Type

  • [x] Bug Fix
  • [ ] Feature
  • [ ] Plugin

Are all requirements met?

  • [ ] Code completed
  • [ ] Smoke tested
  • [ ] Unit tests added
  • [ ] Code documentation added
  • [ ] Any pending items have an associated Issue

Complete description

Ran pylint and got the stackstrace. Fixed it and then ran again without the errors. details below.

typing.TypeVar requires that the name of the var it is assigned to is a match. meaning T = TypeVar("T") is ok but C = TypeVar("T") is not. So when we have a file extension like "tar.gz" there is no way to assign that as a tar.gz variable. So instead we are just switching it to str type.

I left the TypeVar on file extensions in https://github.com/flyteorg/flytekit/blob/master/flytekit/types/file/init.py#L28-L82 since they don't seem to break anything.

I searched the rest of the repo looking for any other regex matches for something like with a period in the file extesnion and didn't find any others.

I also corrected a return type for split_traintest_dataset() since my linter was warning me about it

Tracking Issue

Fixes https://github.com/flyteorg/flyte/issues/2870

Follow-up issue

NA OR https://github.com/flyteorg/flyte/issues/

jerempy avatar Oct 13 '22 21:10 jerempy

@jerempy, thanks for creating the PR! We'll also need to fix the FlyteFile extensions cause they'll fail when you run pylint on a file that contains say, FlyteFile[typing.TypeVar("...")]. You can test this example by including TypeVar for your reference.

@eapolinario suggested using typing.Annotated rather than TypeVar for FlyteFile objects. Would you mind working on it? You may also need to take a look at https://github.com/flyteorg/flytekit/blob/master/flytekit/types/file/file.py file to support the Annotated case. Here's an example where we're using Annotated to accept the user metadata: https://github.com/flyteorg/flytekit/blob/ab9aa65c420afed1ec3b80e0d650fcee3d63852a/flytekit/types/numpy/ndarray.py#L16-L24

samhita-alla avatar Oct 14 '22 06:10 samhita-alla

@samhita-alla thanks for the comment! I tried running pylint on files with FlyteFile[typing.TypeVar("...")] and it wasn't causing breakages.

the original breakage was caused by assigning types to variables with different names - which does cause breakages with pylint. You cannot do VAR = TypeVar("OTHERVAR") pylint would break with that, but this is okay: VAR = TypeVar("VAR"). TypeVar is used 55 times in this repo, and I looked at all of them and they all look good to go.

The specific ones named here in the bug that broke pylint were assigned different names because they had a .(period) in their extenions name, i.e. tar.gz so they are unworkable.

I took a look at typing.Annotated, but per my understanding of it it doesnt seem like it would fir here since it is used to add metadata to types, rather than defining them.

At any rate, I fixed a typehint that broke the tests in CI - should pass CI now 🤞

jerempy avatar Oct 14 '22 13:10 jerempy

Codecov Report

Merging #1236 (bcab554) into master (b1ff43e) will increase coverage by 0.05%. The diff coverage is 61.81%.

@@            Coverage Diff             @@
##           master    #1236      +/-   ##
==========================================
+ Coverage   68.66%   68.71%   +0.05%     
==========================================
  Files         288      286       -2     
  Lines       26359    26364       +5     
  Branches     2490     2493       +3     
==========================================
+ Hits        18100    18117      +17     
+ Misses       7780     7767      -13     
- Partials      479      480       +1     
Impacted Files Coverage Δ
flytekit/types/file/__init__.py 17.94% <26.92%> (+17.94%) :arrow_up:
flytekit/types/directory/types.py 53.27% <33.33%> (-0.89%) :arrow_down:
flytekit/types/file/file.py 60.00% <100.00%> (+0.27%) :arrow_up:
...ests/flytekit/unit/core/test_realworld_examples.py 60.29% <100.00%> (+0.59%) :arrow_up:
tests/flytekit/unit/core/test_type_engine.py 98.46% <100.00%> (+0.03%) :arrow_up:
flytekit/core/python_auto_container.py 49.54% <0.00%> (-0.46%) :arrow_down:
tests/flytekit/unit/core/test_type_delayed.py 96.42% <0.00%> (-0.35%) :arrow_down:
flytekit/core/type_engine.py 58.65% <0.00%> (-0.25%) :arrow_down:
...kit/unit/types/structured_dataset/test_bigquery.py 97.05% <0.00%> (-0.24%) :arrow_down:
tests/flytekit/unit/core/test_interface.py 85.49% <0.00%> (-0.23%) :arrow_down:
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Oct 16 '22 05:10 codecov[bot]

@eapolinario, would you mind taking a look at this PR?

samhita-alla avatar Oct 16 '22 06:10 samhita-alla

sorry @pingsutw if you could review again 🙏 :I noticed that I accidentally changed the return type in https://github.com/flyteorg/flytekit/blob/98acf1bd7a76f0b6f2d04cbeb87ac17b84b3836c/tests/flytekit/unit/core/test_realworld_examples.py#L87

so i reverted it back to featured_columns and class_columns

jerempy avatar Oct 17 '22 03:10 jerempy

@pingsutw think we could merge this one? October is almost over 😅

jerempy avatar Oct 26 '22 22:10 jerempy

@pingsutw , how did you run mypy?

For example, given this file:

import typing
from flytekit.types.file import FlyteFile

def f_no_extension() -> FlyteFile:
    ...

reveal_type(f_no_extension)

def f_string_extension() -> FlyteFile["ext1"]:
    ...

reveal_type(f_string_extension)

def f_typevar() -> FlyteFile[typing.TypeVar("ext2")]:
    ...

reveal_type(f_typevar)

mypy returns:

❯ mypy example.py
example.py:2: error: Cannot find implementation or library stub for module named "flytekit.types.file"
example.py:2: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
example.py:7: note: Revealed type is "def () -> Any"
example.py:9: error: Name "ext1" is not defined
example.py:12: note: Revealed type is "def () -> Any"
example.py:17: note: Revealed type is "def () -> Any"
Found 2 errors in 1 file (checked 1 source file)

Note how in the case of f_string_extension mypy return an error.

eapolinario avatar Oct 27 '22 05:10 eapolinario

sorry, my bad. I used make lint, and it didn't check those two files.

pingsutw avatar Oct 27 '22 07:10 pingsutw

@jerempy, let's not rush things. I can assign a "hacktoberfest-accepted" label in case the PR doesn't get merged before Oct. 31. :)

Can you take a look at the above comments and do the necessary? Here's Eduardo's comment:

I don't believe we should go ahead with this change. Simply replacing mentions to typevar with raw strings is going to start showing up as errors in mypy.

samhita-alla avatar Oct 27 '22 09:10 samhita-alla

Given the discussion about how using a plain str isn't valid and causes issues with mypy, I think this solution is not truly an improvement over the status quo, where mypy is fine but pylint is unhappy. I think the idea should be to work out how typing.Annotated can be used instead of typing.TypeVar. Also, I am not sure whether there needs to be some deprecation schedule for the old way. If it were to simply stop working with TypeVar, that could be problematic. I'd suggest supporting Annotated as an extra case, and emitting deprecation warnings for uses of TypeVar or str, before eventually dropping support for those.

hajapy avatar Oct 27 '22 13:10 hajapy

@samhita-alla thanks for that! @eapolinario and @pingsutw you guys able to take a new look? I re-did the whole thing for typing.Annotated It looks good to both mypy and pylint

jerempy avatar Oct 27 '22 15:10 jerempy

@jerempy, two more changes to go! :)

samhita-alla avatar Oct 28 '22 04:10 samhita-alla

~~Should the project dependencies change to declare dep on typing_extensions?~~

~~Should we only use this for python <3.9 and use import logic like trying to import Annotated from typing and if that fails fall back to typing_extensions?~~

Edit: Ah, flytekit already requires typing_extensions regardless of python version.

How about docs and or release notes for these changes? It's a user visible change in expectation.

hajapy avatar Oct 28 '22 11:10 hajapy

@eapolinario thanks for feedback! Can you review once more? I updated the import and while I was at it noticed that the flytekit.types.FlyteDirectory - was also using the new annotated type but didn't have the string conversion I addeed to the FlyteFile - so I moved the logic for that into a FileExt static method and then placed in both type files.

jerempy avatar Oct 28 '22 15:10 jerempy

~Should the project dependencies change to declare dep on typing_extensions?~

~Should we only use this for python <3.9 and use import logic like trying to import Annotated from typing and if that fails fall back to typing_extensions?~

Edit: Ah, flytekit already requires typing_extensions regardless of python version.

How about docs and or release notes for these changes? It's a user visible change in expectation.

That's a good call and it's exactly how typing_extensions is supposed to work, i.e. if you're in in python >=3.9 importing Annotated from typing or typing_extensions is ok, whereas if you're using python <3.9 you're going to get an error if you try to use typing.Annotated. In other words, typing_extensions offers a type alias for the same underlying type if it exists.

That said, we're going to make it clear that FlyteFile and FlyteDirectory now can use this new mode of setting extensions in the release notes. Also worth noting that this change is backwards compatible (as long as the extension doesn't change), for example:

@task
def t_return_old_style_flytefile() -> FlyteFile[TypeVar("ext")]:
   ...

@task 
def t_takes_new_flytefile(f: Annotated[FlyteFile, FileExt("ext")]):
   ...

@workflow
def wf():
    f =  t_return_old_style_flytefile()
    t_takes_new_flytefile(f=f)

eapolinario avatar Oct 28 '22 17:10 eapolinario

@jerempy , can you fix the sagemaker test? Same error (Annotated being imported from typing)

eapolinario avatar Oct 28 '22 17:10 eapolinario

@eapolinario woops! missed that one. got it now. should be good to go 😎

jerempy avatar Oct 28 '22 19:10 jerempy

happy halloween! Looks like missing 7% diff coverage, based on lines of code - but when I look at the PR I;d say everything is adequately covered - if it fine to merge anyway?

jerempy avatar Oct 31 '22 14:10 jerempy