data icon indicating copy to clipboard operation
data copied to clipboard

Definition of `IterDataPipe` in `pyi` file breaks inheritance path for static type checking

Open ejguan opened this issue 2 years ago • 3 comments

See comments in https://github.com/pytorch/data/pull/780

@pmeier At list for the first Error, the proper typing should be:

def load(path: pathlib.Path) -> IterDataPipe[Tuple[str, BinaryIO]]:
    if path.is_dir():
        dp: IterDataPipe = FileLister(str(path), recursive=True)
    else:
        dp = IterableWrapper([str(path)])
    return FileOpener(dp, mode="rb")

However, even with the proper typing shown above, the Error is changed to Incompatible types in assignment (expression has type "FileListerIterDataPipe", variable has type "IterDataPipe[Any]"). And, it doesn't explain what causes the second Error. So, I spent a few hours figuring out what is the root cause of the mypy Error. In the generated datapipe.pyi file, a new IterDataPipe class is defined, which overrides the original IterDataPipe from the inheritance graph for all other DataPipe.

All Errors are eliminated when I remove new IterDataPipe definition from datapipe.pyi and import IterDataPipe directly from torch.utils.data.datapipe. And, the reason we define new IterDataPipe in pyi file is attaching all functional APIs to it. We need to do it in a different way by keeping the original IterDataPipe and extending the class with all functional APIs.

cc: @NivekT for python interface file

For this PR, I will revert it because our typing system needs to be fixed generally.

Originally posted by @ejguan in https://github.com/pytorch/data/issues/780#issuecomment-1252595095

ejguan avatar Sep 20 '22 16:09 ejguan

In order to fix this problem, we have to find a way to extend the existing IterDataPipe with all functional APIs rather than re-defining a new IterDataPipe in the interface file.

Edit: metaclass might be a solution.

ejguan avatar Sep 20 '22 16:09 ejguan

In order to fix this problem, we have to find a way to extend the existing IterDataPipe with all functional APIs rather than re-defining a new IterDataPipe in the interface file.

Edit: metaclass might be a solution.

Does that metaclass have the exist in Core?

NivekT avatar Sep 20 '22 18:09 NivekT

Does that metaclass have the exist in Core?

What do you mean? I assume metaclass is able to directly attach those functional API to the class. But, I have never tested it.

I might take a look at the problem after all distribtuted work is done

ejguan avatar Sep 20 '22 18:09 ejguan