operation_id should be optional in get_job
In our installations of YTsaurus users don't have direct network access to the exec nodes. This makes it harder to expose Web UI of services running in Vanilla operations, like Spark.
For that, we've developed a proxy, which accepts HTTP traffic, checks YT authentication/authorization and forwards the traffic to exec nodes.
This proxy needs to somehow know where to route the traffic (job and its port from port_count).
Right now this is done by prefixing the URL path with exec node address and physical port, though it could be operation_id + job_id instead of the exec node address.
But prefixing paths is not totally safe because of browser isolation (CORS, cookies, probably something else). Also initially I've thought that it would not be possible to adapt the Spark application inside job to the URLs with prefixed paths, but this has managed to be not a problem after all.
A better solution would be to use wildcard-hosts, something like *.proxy.yt.... This way those hosts would be more isolated from each other.
The problem is that the wildcard max length is 63 characters, so it can't fit both operation_id and job_id. If the traffic would be http, we could've used multiple wildcards, but with https we can't sign an SSL certificate for such a host.
So, if it would be possible to skip operation_id from those requests, then the wildcard host could fit just job_id and port ID.
Still, it is possible to only put operation_id in the wildcard for the isolation and put the rest needed for resolving (job_id, port) inside the path prefix, though a single job_id would be more convenient to use.
So, this pull request is about giving a way to resolve running jobs by their id without the operation_id, so the proxy could proxy the traffic to those.
That's why I only care about running jobs.
We can live without this feature, so if it is too hard to resolve completed jobs, then I won't pursue it.
23.06.2025, 12:36:30 PR autocheck started. Watch workflow progress here.
23.06.2025, 20:01:28 Integration tests are started.
24.06.2025, 03:10:22 Tests finished.
0
24.06.2025, 04:16:45 PR autocheck finished.
Statuses:
Strawberry controller: success
CMake build: success
Ya-make build: success
Tests: cancelled
In progress
Can you give a more detailed description of the problem? I didn't get how optional op id will help
I have detailed the motivation in the description.
After some discussions internally, we decided that we won't be pursuing this for the time being. I will close this PR and we can reopen it when/if needed.
18.07.2025, 09:03:34 PR autocheck started. Watch workflow progress here.
18.07.2025, 16:17:12 Integration tests are started.
18.07.2025, 19:07:43 Tests finished.
Total
| Total | Failed | Ok | Skipped | Not launched |
|---|---|---|---|---|
| 2633 | 2 | 2410 | 221 | 0 |
ci-viewer/16366649725/size_s (returncode 10)
| Total | Failed | Ok | Skipped | Not launched |
|---|---|---|---|---|
| 2633 | 2 | 2410 | 221 | 0 |
Failed suites
18.07.2025, 19:07:53 PR autocheck finished.
Statuses:
Strawberry controller: success
CMake build: success
Ya-make build: success
Tests: success
Once again, we can live without this feature, so this is not urgent, but I have decided to finish it, so, please review it, if you have some time and are open to such a change.
I'll review this pr soon
21.08.2025, 12:27:45 PR autocheck started. Watch workflow progress here.
21.08.2025, 17:35:38 PR autocheck finished.
Statuses:
Strawberry controller: success
CMake build: success
Ya-make build: failure
Tests: skipped
04.09.2025, 09:56:23 PR autocheck started. Watch workflow progress here.
04.09.2025, 09:57:13 PR autocheck finished.
Statuses:
Strawberry controller: skipped
CMake build: skipped
Ya-make build: skipped
Tests: skipped
04.09.2025, 09:57:47 PR autocheck started. Watch workflow progress here.
04.09.2025, 17:42:44 Tests finished.
0
04.09.2025, 20:09:06 PR autocheck finished.
Statuses:
Strawberry controller: success
CMake build: success
Ya-make build: success
Tests: failure