Can not download defaultVersion.jar in Ray
Describe the bug
When submitting a job to Ray via geaflow-console, the defaultVersion.jar cannot be downloaded.
There are two issues:
-
URL parsing in Ray's
packaging.pyexcludes query parameters:
The download URL provided by GeaFlow is in the format:
https://127.0.0.1:8443/api/tasks/%s/files?path=/tmp/geaflow/files/versions/defaultVersion/defaultVersion.jar.zip
However, Ray's default implementation inpackaging.pystrips the query part when generating the cache key, resulting in a key like:
https_127_0_0_1_8443_api_tasks_%s_files
In contrast, GeaFlow'sRayRuntimegenerates a key that includes the query parameters (with special characters replaced), resulting in:
https_127_0_0_1_8443_api_tasks_%s_files?path=_tmp_geaflow_files_versions_defaultVersion_defaultVersion_jar
This inconsistency causes a mismatch inruntime_env, preventing proper reuse of cached files. -
Authentication issue:
Accessing the console's download endpoint requires a validgeaflow-tokenheader, but this token is not passed to Ray during the file download process, resulting in unauthorized access and download failure.
Expected behavior
The defaultVersion.jar should be downloaded successfully and the job should run normally.
Additional context
The above issue is not verified by hdfs or oss, and may have the same problem
Thank you for your questions. Currently, only support ray to download the engine jar from oss (the fileStorage config need be oss). Downloading from Local storage(in console) might not work for ray. We also invite you to take a look at this issue and fix this problem together.
Thank you for your questions. Currently, only support ray to download the engine jar from oss (the fileStorage config need be oss). Downloading from Local storage(in console) might not work for ray. We also invite you to take a look at this issue and fix this problem together.
@652053395 After some investigation I think we can solve the current problems with two complementary changes:
-
Provide authorized download URLs
- Local mode: append the GeaFlow token as a path parameter, e.g.
https://repo.example.com/jars/my-lib.jar?geaflow-token=<token> - OSS / S3: generate a short-lived pre-signed URL
- Local mode: append the GeaFlow token as a path parameter, e.g.
-
The current implementation insists on HTTPS and fails silently otherwise. As a workaround we can skip the java_jars mechanism entirely and download the artifacts ourselves in the container’s entry-point script with
curl -L -o … <signed-url>, then place the files in the expected local directory before the JVM starts.
It's been verified in the local environment, so if you guys think the solution is feasible, I can submit a PR
Thank you for your questions. Currently, only support ray to download the engine jar from oss (the fileStorage config need be oss). Downloading from Local storage(in console) might not work for ray. We also invite you to take a look at this issue and fix this problem together.
@652053395 After some investigation I think we can solve the current problems with two complementary changes:
Provide authorized download URLs
- Local mode: append the GeaFlow token as a path parameter, e.g.
https://repo.example.com/jars/my-lib.jar?geaflow-token=<token>- OSS / S3: generate a short-lived pre-signed URL
The current implementation insists on HTTPS and fails silently otherwise. As a workaround we can skip the java_jars mechanism entirely and download the artifacts ourselves in the container’s entry-point script with
curl -L -o … <signed-url>, then place the files in the expected local directory before the JVM starts.It's been verified in the local environment, so if you guys think the solution is feasible, I can submit a PR
@nylqd Thank you for your solutions. We think it is feasible, there is one question: how to set the curl -L -o …
Thank you for your questions. Currently, only support ray to download the engine jar from oss (the fileStorage config need be oss). Downloading from Local storage(in console) might not work for ray. We also invite you to take a look at this issue and fix this problem together.
@652053395 After some investigation I think we can solve the current problems with two complementary changes:
Provide authorized download URLs
- Local mode: append the GeaFlow token as a path parameter, e.g.
https://repo.example.com/jars/my-lib.jar?geaflow-token=<token>- OSS / S3: generate a short-lived pre-signed URL
The current implementation insists on HTTPS and fails silently otherwise. As a workaround we can skip the java_jars mechanism entirely and download the artifacts ourselves in the container’s entry-point script with
curl -L -o … <signed-url>, then place the files in the expected local directory before the JVM starts.It's been verified in the local environment, so if you guys think the solution is feasible, I can submit a PR
@nylqd Thank you for your solutions. We think it is feasible, there is one question: how to set the curl -L -o … command in the container’s entry-point script dynamically
https://github.com/apache/geaflow/blob/6526b69bbce536d4e918ec9fe6f6c32b9f05cb20/geaflow-console/app/core/service/src/main/java/org/apache/geaflow/console/core/service/runtime/RayRuntime.java#L154-L177
@652053395 Currently, in the buildrequest method, a download link is dynamically generated, and a classpath is specified in the entrypoint. Here, we can dynamically obtain a pre-signed url and concatenate it to the entrypoint