modin icon indicating copy to clipboard operation
modin copied to clipboard

FEAT-#7139: use ray-core+grpcio instead of ray-default

Open anmyachev opened this issue 1 year ago • 2 comments
trafficstars

What do these changes do?

These changes are aimed at reducing the number of dependencies and thereby simplifying the installation of Modin with the Ray engine in the user environment. For example, we might not install dependencies for the Ray dashboard. To do this, we could use either package ray[client], but it appears only from version Ray>=2.6.0 (there is no desire to increase the minimally supported version of Ray so much for now), or switch to using the basic package itself (ray), but this will not work for fairly new versions of Ray, so how they don't include grpcio dependency.

As a transitional option, we can use the basic package with an explicit indication of grpcio package . This option will work, in this pull request, I tried it with different versions of Ray. However, there is a chance that grpcio package of some version will be installed, with which Ray has problems. Ray can explicitly exclude any versions in such cases (note: the exceptions are not made for the basic package, but for ray[client] package, so they cannot be reused on our side), but using the proposed approach, we most likely will not know about this until the problem arises for users.

If we decide to switch to such a transitional option, then I believe we should also consider switching to ray[client] package for Modin 1.0 release.

For example, dependencies for ray[default]==2.9.3:

        "default": [
            # If adding dependencies necessary to launch the dashboard api server,
            # please add it to dashboard/optional_deps.py as well.
            "aiohttp >= 3.7",
            "aiohttp_cors",
            "colorful",
            "py-spy >= 0.2.0",
            "requests",
            "gpustat >= 1.0.0",  # for windows
            "grpcio >= 1.32.0; python_version < '3.10'",  # noqa:E501
            "grpcio >= 1.42.0; python_version >= '3.10'",  # noqa:E501
            "opencensus",
            "pydantic!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3",
            "prometheus_client >= 0.7.1",
            "smart_open",
            "virtualenv >=20.0.24, !=20.21.1",  # For pip runtime env.
        ],
  • [x] first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • [ ] passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • [ ] passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • [ ] signed commit with git commit -s
  • [ ] Resolves #7139
  • [ ] tests added and passing
  • [ ] module layout described at docs/development/architecture.rst is up-to-date

anmyachev avatar Feb 21 '24 20:02 anmyachev

@YarShev @sfc-gh-dpetersohn @dchigarev thoughts?

anmyachev avatar Apr 02 '24 11:04 anmyachev

switch to using the basic package itself (ray), but this will not work for fairly new versions of Ray, so how they don't include grpcio dependency.

Why did they exclude grpcio? If we install ray-core, it won't work out of the box, right? Doesn't this look like an issue for Ray?

As a transitional option, we can use the basic package with an explicit indication of grpcio package . This option will work, in this pull request, I tried it with different versions of Ray. However, there is a chance that grpcio package of some version will be installed, with which Ray has problems. Ray can explicitly exclude any versions in such cases (note: the exceptions are not made for the basic package, but for ray[client] package, so they cannot be reused on our side), but using the proposed approach, we most likely will not know about this until the problem arises for users.

If Ray limits grpcio to install specific versions, it seems there shouln't be a problem when installing pip install grpcio ray-core. The pip resolver should handle a case with incompatible versions of grpcio for Ray.

If we decide to switch to such a transitional option, then I believe we should also consider switching to ray[client] package for Modin 1.0 release.

Why?

YarShev avatar Apr 24 '24 13:04 YarShev

Why did they exclude grpcio? If we install ray-core, it won't work out of the box, right? Doesn't this look like an issue for Ray?

I don’t know for sure, but this is a conscious decision to transfer this package to another target: ray[client].

If Ray limits grpcio to install specific versions, it seems there shouldn't be a problem when installing pip install grpcio ray-core. The pip resolver should handle a case with incompatible versions of grpcio for Ray.

The default target does not contain version pins for grpcio dependency (only ray[client] or ray[default]).

If we decide to switch to such a transitional option, then I believe we should also consider switching to ray[client] package for Modin 1.0 release.

Why?

I consider this a more reliable solution, since this is not our direct dependency, but a transitive one from Ray.

anmyachev avatar Apr 29 '24 13:04 anmyachev

Hi @rkooo567! I would like to be more confident that we are going in the right direction towards reducing Ray dependencies installed by default. Do I understand correctly that in a situation where core functionality and a cluster launcher are needed, it will be enough to install ray[client] instead of ray[default]? (since the dashboard from 'default' target is not needed)

anmyachev avatar Apr 29 '24 22:04 anmyachev

I think there's no package that only includes cluster launcher. ray[client] is not for the cluster launcher (it is for ray client mode https://docs.ray.io/en/master/cluster/running-applications/job-submission/ray-client.html, which is not recommended to use)

rkooo567 avatar Apr 29 '24 23:04 rkooo567

I think there's no package that only includes cluster launcher. ray[client] is not for the cluster launcher (it is for ray client mode https://docs.ray.io/en/master/cluster/running-applications/job-submission/ray-client.html, which is not recommended to use)

@rkooo567 thanks! Now it’s more clear how to deal with this.

anmyachev avatar Apr 30 '24 14:04 anmyachev

It looks like we can safely remove ClientObjectRef from Modin sources. It was introduced in https://github.com/modin-project/modin/pull/2851. Ray fixed the problem a long time ago and defined the type like this: class ClientObjectRef(raylet.ObjectRef):

anmyachev avatar Apr 30 '24 20:04 anmyachev