ytsaurus icon indicating copy to clipboard operation
ytsaurus copied to clipboard

operation_id should be optional in get_job

Open faucct opened this issue 6 months ago • 10 comments

In our installations of YTsaurus users don't have direct network access to the exec nodes. This makes it harder to expose Web UI of services running in Vanilla operations, like Spark.

For that, we've developed a proxy, which accepts HTTP traffic, checks YT authentication/authorization and forwards the traffic to exec nodes.

This proxy needs to somehow know where to route the traffic (job and its port from port_count).

Right now this is done by prefixing the URL path with exec node address and physical port, though it could be operation_id + job_id instead of the exec node address.

But prefixing paths is not totally safe because of browser isolation (CORS, cookies, probably something else). Also initially I've thought that it would not be possible to adapt the Spark application inside job to the URLs with prefixed paths, but this has managed to be not a problem after all.

A better solution would be to use wildcard-hosts, something like *.proxy.yt.... This way those hosts would be more isolated from each other.

The problem is that the wildcard max length is 63 characters, so it can't fit both operation_id and job_id. If the traffic would be http, we could've used multiple wildcards, but with https we can't sign an SSL certificate for such a host. So, if it would be possible to skip operation_id from those requests, then the wildcard host could fit just job_id and port ID. Still, it is possible to only put operation_id in the wildcard for the isolation and put the rest needed for resolving (job_id, port) inside the path prefix, though a single job_id would be more convenient to use.

So, this pull request is about giving a way to resolve running jobs by their id without the operation_id, so the proxy could proxy the traffic to those. That's why I only care about running jobs.

We can live without this feature, so if it is too hard to resolve completed jobs, then I won't pursue it.

faucct avatar Jun 23 '25 12:06 faucct

23.06.2025, 12:36:30 PR autocheck started. Watch workflow progress here. 23.06.2025, 20:01:28 Integration tests are started. 24.06.2025, 03:10:22 Tests finished. 0 24.06.2025, 04:16:45 PR autocheck finished. Statuses: Strawberry controller: success CMake build: success Ya-make build: success Tests: cancelled

github-actions[bot] avatar Jun 23 '25 12:06 github-actions[bot]

In progress

renadeen avatar Jul 01 '25 09:07 renadeen

Can you give a more detailed description of the problem? I didn't get how optional op id will help

I have detailed the motivation in the description.

faucct avatar Jul 03 '25 08:07 faucct

After some discussions internally, we decided that we won't be pursuing this for the time being. I will close this PR and we can reopen it when/if needed.

achulkov2 avatar Jul 15 '25 21:07 achulkov2

18.07.2025, 09:03:34 PR autocheck started. Watch workflow progress here. 18.07.2025, 16:17:12 Integration tests are started. 18.07.2025, 19:07:43 Tests finished.

Total

Total Failed Ok Skipped Not launched
2633 2 2410 221 0

ci-viewer/16366649725/size_s (returncode 10)

Total Failed Ok Skipped Not launched
2633 2 2410 221 0

Failed suites

18.07.2025, 19:07:53 PR autocheck finished. Statuses: Strawberry controller: success CMake build: success Ya-make build: success Tests: success

github-actions[bot] avatar Jul 18 '25 09:07 github-actions[bot]

Once again, we can live without this feature, so this is not urgent, but I have decided to finish it, so, please review it, if you have some time and are open to such a change.

faucct avatar Jul 18 '25 12:07 faucct

I'll review this pr soon

bystrov-serg avatar Jul 18 '25 12:07 bystrov-serg

21.08.2025, 12:27:45 PR autocheck started. Watch workflow progress here. 21.08.2025, 17:35:38 PR autocheck finished. Statuses: Strawberry controller: success CMake build: success Ya-make build: failure Tests: skipped

github-actions[bot] avatar Aug 21 '25 12:08 github-actions[bot]

04.09.2025, 09:56:23 PR autocheck started. Watch workflow progress here. 04.09.2025, 09:57:13 PR autocheck finished. Statuses: Strawberry controller: skipped CMake build: skipped Ya-make build: skipped Tests: skipped

github-actions[bot] avatar Sep 04 '25 09:09 github-actions[bot]

04.09.2025, 09:57:47 PR autocheck started. Watch workflow progress here. 04.09.2025, 17:42:44 Tests finished. 0 04.09.2025, 20:09:06 PR autocheck finished. Statuses: Strawberry controller: success CMake build: success Ya-make build: success Tests: failure

github-actions[bot] avatar Sep 04 '25 09:09 github-actions[bot]