ray
ray copied to clipboard
[Core] ray.remote raises ValueError when used on torch IterableDataset
Why are these changes needed?
-
If a class has
__class_getitems__
which should return aGenericAlias
object, theinspect.signature(...)
will throw an exception said thatValueError: no signature found for builtin type <class 'types.GenericAlias'>
.- https://docs.python.org/3/reference/datamodel.html#class-getitem-versus-getitem
class_getitem() should return a GenericAlias object if it is properly defined.
- Note that
__class_getitems__
is not necessary to return aGenericAlias
.
- https://docs.python.org/3/reference/datamodel.html#class-getitem-versus-getitem
-
https://docs.python.org/3.10/library/inspect.html#inspect.Signature
Note Some callables may not be introspectable in certain implementations of Python. For example, in CPython, some built-in functions defined in C provide no metadata about their arguments.
Why can this PR avoid the ValueError?
from types import GenericAlias
@ray.remote
class MyClass:
__class_getitem__ = classmethod(GenericAlias)
# This would allow MyClass[int] to behave similarly to specifying a generic type
myClass = MyClass.remote()
obj_ref = myClass.__class_getitem__.remote(int)
print(ray.get(obj_ref))
-
Without this PR, the type of the
__class_getitem__
ismethod
.-
_signature_from_callable
-
isinstance(obj, types.MethodType)
-> True (code)-
_get_signature_of
(i.e. partial_signature_from_callable
)-
isinstance(obj, types.MethodType)
-> False (code) (type(obj)
->types.GenericAlias
) -
unwrap
(code) - No
obj.__signature__
orobj._partialmethod
(code, code) -
isfunction(obj) or _signature_is_functionlike(obj)
=> False (code) -
_signature_is_builtin(obj)
=> False (code) -
isinstance(obj, functools.partial)
=> False (code) -
isinstance(obj, type)
=> True (code)- There is no
__call__
or__new__
or__init__
. - Throw an
ValueError
. (code)
- There is no
-
-
-
-
-
With this PR, the type of the
__class_getitem__
ismethod-wrapper
.-
_signature_from_callable
-
isinstance(obj, types.MethodType)
-> False (code) -
unwrap
(code) =>type(obj)
is stillmethod-wrapper
. - No
obj.__signature__
orobj._partialmethod
(code, code) -
isfunction(obj) or _signature_is_functionlike(obj)
=> False (code) -
_signature_is_builtin(obj)
=> True (code) becausemethod-wrapper
is one of _NonUserDefinedCallables.
-
-
Why method = method.__init__
?
Honestly, I am not 100% sure whether it will cause some potential issues or not, but
- https://github.com/ets-labs/python-dependency-injector/commit/8cc2c1188bc92f0eeaaa212641ae659723009f26 has been merged for 3 years and has never been reverted, indicating that this solution works well.
- The manual test below works well.
- If any issues arise, users can define
__getitem__
themselves as a workaround.
Useful links
- https://github.com/ets-labs/python-dependency-injector/issues/398
- https://github.com/ets-labs/python-dependency-injector/commit/8cc2c1188bc92f0eeaaa212641ae659723009f26
- https://github.com/ets-labs/python-dependency-injector/issues/362
- https://github.com/ets-labs/python-dependency-injector/commit/4991c5d4b0d6230a395ac90ba675502296c50619
Related issue number
Closes #42914
Checks
- [ ] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [ ] I've run
scripts/format.sh
to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
doc/source/tune/api/
under the corresponding.rst
file.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [x] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
from torch.utils.data import IterableDataset
import ray
@ray.remote
class SyncDataCollector(IterableDataset):
def __iter__(self):
return iter(range(3, 10))
ray.init()
sdc = SyncDataCollector.remote()
obj_ref = sdc.__iter__.remote()
it = ray.get(obj_ref)
for i in it:
print(i)
# 3 4 5 6 7 8 9
-
IterableDataset
inherits fromIterable
, which defines__class_getitem__ = classmethod(GenericAlias)
. (code)
IIUC, this is a python 3.9 bug. Do you know if it's fixed in 3.10+?
No, Python 3.10 still has the issue. I am not sure whether to call it a bug or a limitation. The official documentation for inspect.signature
already acknowledges that "Note Some callables may not be introspectable in certain implementations of Python. For example, in CPython, some built-in functions defined in C provide no metadata about their arguments".
The premerge fails on test_classmethod_genericalias
and test_placement_group_4.py
. I tried to reproduce locally with Python 3.8 and Python 3.10, but I can't.
-
Python 3.10
-
Python 3.8
Python 3.9.0 still can't reproduce the issue.
I installed the Python 3.9 artifact on the Buildkite CI, and then replaced the Ray Python scripts with my local ones. I still can't reproduce the CI error.
I used https://github.com/ray-project/ray/pull/43117/commits/5186b34951fb317beea59db55997e408d1184e76 to verify that the CI has actually entered the following if
statement as expected. The repro CI currently doesn't support pre-merge. I have already spent a lot of time trying to reproduce this strange issue. I will reach out to the CI team to discuss how to reproduce it.
if GenericAlias and any(
(
method is GenericAlias,
getattr(method, "__func__", None) is GenericAlias,
)
):
method = method.__init__
Is this basically a bug from inspect module?
Yes. See https://github.com/ray-project/ray/pull/43117#issuecomment-1947459842 for more details.
Do you know what's the timeline for the fix in Python level? (or is it fixed in the latest versions)?
No, I don't think there is a timeline for that. It has lasted for more than 3 years.
Also this method = method.init is only applied to class_getitme right? Other methods are not affected
I believe so because I can still get 3 4 5 6 7 8 9
in the following example.
from torch.utils.data import IterableDataset
import ray
@ray.remote
class SyncDataCollector(IterableDataset):
def __iter__(self):
return iter(range(3, 10))
ray.init()
sdc = SyncDataCollector.remote()
obj_ref = sdc.__iter__.remote()
it = ray.get(obj_ref)
for i in it:
print(i)
# 3 4 5 6 7 8 9
Btw, I am very comfortable merging this PR. I feel a bit hacky that we are monkey patching the method. The CI failure also makes me feel like this is not a good direction.
Is this reasonable we just ask for the workaround (creating an instance inside an actor instead of directly adding .remote)? Wdyt @kevin85421 ? The issue was also originally p2
Is this reasonable we just ask for the workaround (creating an instance inside an actor instead of directly adding .remote)?
Do you mean something like below? If so, it makes sense to me.
@ray.remote
class SyncDataCollector:
def __init__(self):
self.xxxx = xxxxx # something inherited from IterableDataset.
def __iter__(self):
# call self.xxxx
yes. I prefer to ask users to workaround than adding dogpatches unless the issue is widely exists (and critical). We've actually suffered from some dogpatches we made in the past multiple times and had to revert at the end. And the fact that Python doesn't try fix this issue also makes me a little concern (there could be a reason why it is not fixed in their layer like there's an anti pattern) though for this part, we can do a bit more sesearch.