tugraph-analytics icon indicating copy to clipboard operation
tugraph-analytics copied to clipboard

Can not download defaultVersion.jar in Ray

Open nylqd opened this issue 4 months ago • 4 comments

Describe the bug
When submitting a job to Ray via geaflow-console, the defaultVersion.jar cannot be downloaded.

There are two issues:

  1. URL parsing in Ray's packaging.py excludes query parameters:
    The download URL provided by GeaFlow is in the format:
    https://127.0.0.1:8443/api/tasks/%s/files?path=/tmp/geaflow/files/versions/defaultVersion/defaultVersion.jar.zip
    However, Ray's default implementation in packaging.py strips the query part when generating the cache key, resulting in a key like:
    https_127_0_0_1_8443_api_tasks_%s_files
    In contrast, GeaFlow's RayRuntime generates a key that includes the query parameters (with special characters replaced), resulting in:
    https_127_0_0_1_8443_api_tasks_%s_files?path=_tmp_geaflow_files_versions_defaultVersion_defaultVersion_jar
    This inconsistency causes a mismatch in runtime_env, preventing proper reuse of cached files.

  2. Authentication issue:
    Accessing the console's download endpoint requires a valid geaflow-token header, but this token is not passed to Ray during the file download process, resulting in unauthorized access and download failure.

Expected behavior
The defaultVersion.jar should be downloaded successfully and the job should run normally.

Additional context

Image Image

The above issue is not verified by hdfs or oss, and may have the same problem

nylqd avatar Aug 14 '25 08:08 nylqd

Thank you for your questions. Currently, only support ray to download the engine jar from oss (the fileStorage config need be oss). Downloading from Local storage(in console) might not work for ray. We also invite you to take a look at this issue and fix this problem together.

652053395 avatar Aug 14 '25 11:08 652053395

Thank you for your questions. Currently, only support ray to download the engine jar from oss (the fileStorage config need be oss). Downloading from Local storage(in console) might not work for ray. We also invite you to take a look at this issue and fix this problem together.

@652053395 After some investigation I think we can solve the current problems with two complementary changes:

  1. Provide authorized download URLs

    • Local mode: append the GeaFlow token as a path parameter, e.g. https://repo.example.com/jars/my-lib.jar?geaflow-token=<token>
    • OSS / S3: generate a short-lived pre-signed URL
  2. The current implementation insists on HTTPS and fails silently otherwise. As a workaround we can skip the java_jars mechanism entirely and download the artifacts ourselves in the container’s entry-point script with curl -L -o … <signed-url>, then place the files in the expected local directory before the JVM starts.

It's been verified in the local environment, so if you guys think the solution is feasible, I can submit a PR

nylqd avatar Aug 29 '25 01:08 nylqd

Thank you for your questions. Currently, only support ray to download the engine jar from oss (the fileStorage config need be oss). Downloading from Local storage(in console) might not work for ray. We also invite you to take a look at this issue and fix this problem together.

@652053395 After some investigation I think we can solve the current problems with two complementary changes:

  1. Provide authorized download URLs

    • Local mode: append the GeaFlow token as a path parameter, e.g. https://repo.example.com/jars/my-lib.jar?geaflow-token=<token>
    • OSS / S3: generate a short-lived pre-signed URL
  2. The current implementation insists on HTTPS and fails silently otherwise. As a workaround we can skip the java_jars mechanism entirely and download the artifacts ourselves in the container’s entry-point script with curl -L -o … <signed-url>, then place the files in the expected local directory before the JVM starts.

It's been verified in the local environment, so if you guys think the solution is feasible, I can submit a PR

@nylqd Thank you for your solutions. We think it is feasible, there is one question: how to set the curl -L -o … command in the container’s entry-point script dynamically

652053395 avatar Aug 29 '25 07:08 652053395

Thank you for your questions. Currently, only support ray to download the engine jar from oss (the fileStorage config need be oss). Downloading from Local storage(in console) might not work for ray. We also invite you to take a look at this issue and fix this problem together.

@652053395 After some investigation I think we can solve the current problems with two complementary changes:

  1. Provide authorized download URLs

    • Local mode: append the GeaFlow token as a path parameter, e.g. https://repo.example.com/jars/my-lib.jar?geaflow-token=<token>
    • OSS / S3: generate a short-lived pre-signed URL
  2. The current implementation insists on HTTPS and fails silently otherwise. As a workaround we can skip the java_jars mechanism entirely and download the artifacts ourselves in the container’s entry-point script with curl -L -o … <signed-url>, then place the files in the expected local directory before the JVM starts.

It's been verified in the local environment, so if you guys think the solution is feasible, I can submit a PR

@nylqd Thank you for your solutions. We think it is feasible, there is one question: how to set the curl -L -o … command in the container’s entry-point script dynamically

https://github.com/apache/geaflow/blob/6526b69bbce536d4e918ec9fe6f6c32b9f05cb20/geaflow-console/app/core/service/src/main/java/org/apache/geaflow/console/core/service/runtime/RayRuntime.java#L154-L177

@652053395 Currently, in the buildrequest method, a download link is dynamically generated, and a classpath is specified in the entrypoint. Here, we can dynamically obtain a pre-signed url and concatenate it to the entrypoint

nylqd avatar Aug 29 '25 07:08 nylqd