taskcluster
taskcluster copied to clipboard
Auth service sends `401` when headers are too large
At the moment generic worker sends this error
WORKER EXCEPTION due to response code 401 from Queue when uploading artifact &main.RedirectArtifact{BaseArtifact:(*main.BaseArtifact)(0xc00007abd0), URL:"https://community-websocktunnel.services.mozilla.com/x", ContentType:"text/plain; charset=utf-8"} with CreateArtifact payload {"contentType":"text/plain; charset=utf-8","expires":"2022-09-05T13:08:57.507Z","storageType":"reference","url":"https://community-websocktunnel.services.mozilla.com/x"} - HTTP response body: {
"code": "AuthenticationFailed",
"message": "Bad Request: Header length too long\n\n---\n\n* method: createArtifact\n* errorCode: AuthenticationFailed\n* statusCode: 401\n* time: 2022-09-05T11:53:57.560Z",
"requestInfo": {
"method": "createArtifact",
"params": {
"0": "public/logs/live.log",
"taskId": "UudWpCqCRKeVO9WTRa4bsg",
"runId": "0",
"name": "public/logs/live.log"
},
"payload": {
"contentType": "text/plain; charset=utf-8",
"expires": "2022-09-05T13:08:57.507Z",
"storageType": "reference",
"url": "https://community-websocktunnel.services.mozilla.com/."
},
"time": "2022-09-05T11:53:57.560Z"
}
}
This 401
code is misleading, probably it is better to return other 4xx status like 431
This is coming from hawk library directly: https://github.com/mozilla/hawk/blob/main/lib/utils.js#L135
Note, this happens when task.scopes
hits a certain size, since they are encoded into the the Authorization
header of the queue.createArtifact
http request made by the worker that claims the task.
Note this is a generic issue, for any API request from a temporary client, whose scopes consume a lot of characters. I'm not sure what size limitation hawk places on the Authorization header, or whether there is an underlying HTTP limit to the header length, but currently a temporary client can have an unbounded list of scopes, both in number of scopes, and length of individual scope name.
From memory, I think the scopes are encoded in an ext
field that is included as part of the Authorization header. I don't know if there is a hawk-friendly means to place this in a separate header of unbounded length, or whether there may be an option to lift the size restriction that we're currently seeing.
To be explicit: this isn't really a worker issue, or a queue.createArtifact
issue - it is a general issue with our system for authorization in taskcluster, since it looks like we allow unbounded lists of scopes to be granted to temporary clients, yet our underlying auth mechanism has bounded length inputs.
Yeah, this is a longstanding issue :(
My understanding is that various HTTP things (servers, proxies, browsers, etc.) place various arbitrary limits on header sizes, so I don't think trying to increase the bound is a good strategy.
JWTs would suffer from the same issue. That might be a place to start investigating: is there any "escape hatch" mechanism defined for large JWTs?
I wonder if we can create "disposable" role in TC, replace all those scopes with assume:temp-role-xx
and then cleanup the role after some time
You could also create a client with the necessary scopes -- but doing one for each task may be a lot of client rows!
True .. maybe the best thing to do here is to push the responsibility on a user doing this. API should just clearly return non-401
error when this happens, and inform user that if he needs so many scopes, he should create a role himself.
True .. maybe the best thing to do here is to push the responsibility on a user doing this. API should just clearly return non-
401
error when this happens, and inform user that if he needs so many scopes, he should create a role himself.
If we do this, I think the size-check lives in taskcluster-lib-api (rather than hawk, of course) and status code 431 indeed is probably a more accurate reflection of the problem. Although I think the error message is more important than the response status code, as the root cause can only be determined by a user if they understand that taskcluster scopes are encoded into the Authorization header, whereas taskcluster-lib-api could explain this explicitly.
In a way, keeping 401 status code might not be a bad thing - there may be tools interpreting response codes, and the more they have to handle, the harder and more complex their task will be. Ultimately, providing too many scopes causes a failure to authenticate the request, so a 401 is in many ways may also be appropriate. I think the biggest problem is that the error says Bad Request: Header length too long
rather than something more descriptive like:
Bad Request: Temporary client {name of client if named} scopes list takes up too much space.
Size: {size}, Limit: {limit}.
Consider creating a role with scopes:
{scope list}
and granting the single scope assume:<role_name> to temporary client {name of client if named}.
where the {...}
variables above are replaced with runtime values from the request. Since the error message suggests the user chooses a role name, <role_name>
in the error message is literal text, to indicate to the user to use the role name that they chose.