shields icon indicating copy to clipboard operation
shields copied to clipboard

Docker Version badge producing "invalid response data" error

Open jauderho opened this issue 2 years ago • 17 comments

Are you experiencing an issue with...

shields.io

🐞 Description

Recently, I've been experimenting with signing container images using cosign and SBOM generation.

Here's the workflow to doing so: https://github.com/jauderho/dockerfiles/blob/099655baac390f47854f151eab3c1da3f0396aee/.github/workflows/age.yml

This results in additional artifacts being created and uploaded to Docker Hub. See https://hub.docker.com/r/jauderho/age/tags

🔗 Link to the badge

Here's the link to the badge in question: https://github.com/jauderho/dockerfiles/blob/main/age/README.md

Version

💡 Possible Solution

I believe that these artifacts are causing parsing issues resulting in the invalid response data seen above instead of correctly displaying the version.

Therefore, it would be great if shields.io could ignore artifacts of the form sha256-SHA256SUM.[att|sig] . Much thanks.

More background information here:

  • https://github.com/docker/roadmap/issues/269#issuecomment-1026417937
  • https://github.com/jauderho/dockerfiles/discussions/149

jauderho avatar Feb 08 '22 22:02 jauderho

Thanks for letting us know and including some background and hypothesis.

However, this error is indicative of the raw response we get from Dockerhub not matching the expected hub in terms of present/missing fields, not anything related to parsing/processing of tag values. I've updated the issue title to that effect to clarify the underlying problem vs. a proposed solution.

I haven't had a chance to actually go through the API response which can start to get fairly gnarly in repos with high quantities of tags, but the issue is going to be somewhere around here:

https://github.com/badges/shields/blob/e3854098d629e9c390196df0c0fd2fd5795b6bca/services/docker/docker-version.service.js#L12-L25

where one of those required fields aren't present for your repo target. we need to try to figure out why/if that's an intentional and expected pattern from the API side. if not, then that's an issue that should be taken upstream to the dockerhub folks, but if so, then we'll need to relax that schema a bit and then if necessary handle the error scenario internally

calebcartwright avatar Feb 08 '22 23:02 calebcartwright

@calebcartwright , thanks for the quick feedback.

I do not know much about Javascript but I'll take a stab at things. Looks like the URL is constructed here: https://github.com/badges/shields/blob/e3854098d629e9c390196df0c0fd2fd5795b6bca/services/docker/docker-version.service.js#L84

Based on that construction, I can now pull json data from two images:

  • https://registry.hub.docker.com/v2/repositories/jauderho/age/tags (with sig and SBOM artifacts)
  • https://registry.hub.docker.com/v2/repositories/jauderho/rustybgp/tags (works fine)

After diffing the json, I think I know what the problem is. architecture is a required field and the .sig/.att artifacts unsurprisingly have no defined architecture nor os.

See https://gist.github.com/jauderho/9c5451ef361c8265bbb3f1db0914f058#file-sig-json-L13 vs. https://gist.github.com/jauderho/0258a4bab184c77bc694e297b56754f2#file-normal-json-L13

I think you would want to keep the architecture requirement for normal use cases but maybe you can use a .without() to exclude name fields that match an example pattern of sha256-a2b7f0e3a43e7daa1e13204cdf093cf03f5042be26a451817cca561b345725ec.sig

I have not used Joi so will need some help constructing an exclusion pattern.

jauderho avatar Feb 09 '22 00:02 jauderho

I'm not entirely sure I follow how you're going about signing images, but more broadly I'm surprised that the API call is returning an "extra" object that ostensibly represents an actual tag. I'd be more curious to find out whether this behavior is intentional, or potential a bug on the Dockerhub API side.

calebcartwright avatar Feb 12 '22 20:02 calebcartwright

At a high level, until the Docker GitHub action actually gains the ability to sign images, right now we have to sign the image as a separate step.

So it's roughly:

  • Build and push container image to container registry
  • Capture the image digest as part of the output
  • Separate step to sign the image digest
  • Push resulting .sig to container registry
  • Do something similar for SBOM generation

Per @dlorenc, this is going to be the behavior until changes to the OCI spec and registries can be made (ETA unknown). To be clear, this affects more than Docker Hub.

See the following examples from GHCR and GitLab:

  • GitHub - https://github.com/jauderho/sandbox/pkgs/container/age-sandbox
  • GitLab - https://gitlab.com/jauderho/sandbox/container_registry/2714238

Therefore, if we want to store signatures and attestations in the registries in the near future, the shields badges will not work as things currently stand and hence the invalid response data until those are filtered/ignored.

jauderho avatar Feb 12 '22 23:02 jauderho

I wasn't aware of these efforts before but I get the gist. Seems there's a big push these days to get signature and bills for just about everything, so I'm not surprised in the least that this is occuring in the container image space too.

I also appreciate you sharing the extra context around the breadth, though I suspect the nature of the other registries will be largely orthogonal in our specific context. For reasons too complicated to delve into here, we don't integrate with those others and are ultimately focused on the request/response relationship between us and the Dockerhub API.

Given the observed behavior, and mentions of things like needing spec updates, I'll go out on a relatively safe limb and assume that the Docker tooling (ostensibly both client and server side) doesn't yet natively support the association of these artifacts to their target image tag (at least not well). This will change at some point, but for now, folks are strictly using auxiliary tooling which is working around those native gaps by distributing the associated artifacts in a way that from a registry & API POV looks like separate tags.

Our Docker badges are among our most popular, and we deal with a fairly high number of requests that in turn process responses from Dockerhub that are on the larger side compared to many other servicse/badges. As such, before we take the plunge of incorporating some type of regex based processing on every element in the collections in those responses, there's a few things I'd love to get some clarity on, or at least pointers to relevant upstream issues so that we can better track things/make sure we drop any extra processing as soon as it becomes unnecessary:

  • Is my summary above a reasonably, if oversimplified, recap?
  • Assuming there are different auxiliary tools being used for generating+validating these extra artifacts, are they all following the same paradigm in terms of how the artifacts ultimately end up represented in the Dockerhub API response structure? (would like to avoid having multiple bespoke solutions to account for tool variance)
  • Presumably Docker are aware of the current state of affairs and the tactical solution that's being utilized today, have they given any kind of :+1: anywhere to indicate their approval/acceptance of this tactical solution?
    • Similarly, any tracking issues/public pronouncement from the Docker folks relative to their plans with the API (e.g. are they planning on doing any filtering themselves serverside, any additional query parameters to control said hypothetical filtering, etc.)
  • What, if any, issues exist in relevant upstream Docker+friends repositories would you say are worth tracking?

My hope is that there's enough of a Docker+community collaboration on this topic that Docker are thinking about ways to sort this, even tactically, on the server side such that each individual client doesn't have to separately reinvent the same wheel.

calebcartwright avatar Feb 15 '22 18:02 calebcartwright

Thinking about this as more of a "black box" (rather than getting stuck into the implementation details), it seems like what we've discovered here is that in practice, sometimes this API returns objects that have an empty architecture property (and the reason for that appears to be valid, if a bit unexpected). Is that a fair summary?

If so, I think we can somewhat simplify the issue of how we fix the badge by saying: that means we need to allow architecture to be empty (i.e: Joi.string().allow('')) without getting too bogged down in the reasons for it (although I don't want to derail an interesting conversation on that topic). This would be in line with our stated processes for input validation: https://github.com/badges/shields/blob/master/doc/input-validation.md and how we fix validation problems when they occur.

Forgive me if there's some detail I am missing here.

chris48s avatar Feb 15 '22 21:02 chris48s

If so, I think we can somewhat simplify the issue of how we fix the badge by saying: that means we need to allow architecture to be empty (i.e: Joi.string().allow(''))

Realize this is the fastest path to market, so to speak, but the reason we're seeing this is because the platform's being used in unexpected ways (you'd always expect an arch for image). Would personally like to get more info to make sure we don't end up chasing a moving target, or that we don't go down our own quick fix route only to have it be obviated by upstream changes we're similarly not aware of.

If i were to attempt to analogize with some hyperbole, it would be like requesting the versions of an npm package on npmjs and amongst the collection alongside typical version objects seeing something very unexpected like an xml or binary reference. Yes we can, and likely will, need to work around it, but it's unexpected enough that it warrants spending a little more time on the "why" IMO

calebcartwright avatar Feb 15 '22 21:02 calebcartwright

As I was suggesting above, I would not necessarily recommend a blanket allow for architecture to be empty since there's a valid reason for making sure there is generally an architecture available, rather I'm suggesting that if name matches a specific pattern of sha256-*.* to then permit the architecture to be empty.

This preserves the prior behavior and checks when allowing for the fact that signatures and attestations are going to more of a thing going forward.

I'd be happy to track this for upstream changes of direction and inform here if that does happen. But based on what I'm seeing upstream, changes to the registries might take a while so this might be the best imperfect solution for now.

jauderho avatar Feb 15 '22 22:02 jauderho

I'd be happy to track this for upstream changes of direction and inform here if that does happen. But based on what I'm seeing upstream, changes to the registries might take a while so this might be the best imperfect solution for now.

I appreciate the offer, but that's not the ask and I don't think it would be very efficient anyway. It'll be a lot easier to have the relevant touch points directly navigable from here, so simply sharing some links to upstream issues would be super helpful if you're up for it!

calebcartwright avatar Feb 15 '22 23:02 calebcartwright

@calebcartwright Just checking in. Is this something that can/will be fixed or is the plan to see what the registries do?

I'd like to start signing all of the images that I build but am not keen on having the badges showing "invalid response" once they are signed. Just trying to figure out if I should start planning to do this now or wait 6 months. Thanks.

jauderho avatar Mar 03 '22 20:03 jauderho

@jauderho as per last comments, we are waiting on you to share relevant upstream issues

calebcartwright avatar Mar 03 '22 21:03 calebcartwright

FWIW, I'm chasing down "invalid" in DockerHub pulls right now, where running the server locally works fine. I found this issue in search results and: current HEAD works fine, I can look at http://localhost:8080/docker/pulls/jauderho/age.svg and see 5.5k pulls. So I don't think that cosign is the source of the issues here.

philpennock avatar May 12 '22 16:05 philpennock

FWIW, I'm chasing down "invalid" in DockerHub pulls right now, where running the server locally works fine. I found this issue in search results and: current HEAD works fine, I can look at http://localhost:8080/docker/pulls/jauderho/age.svg and see 5.5k pulls. So I don't think that cosign is the source of the issues here.

Thank you for looking at this @philpennock! However, I think it's important to note that the issue reported and discussed here is the docker version badge, not the docker pulls badge. While those are both docker-related badges, the use different endpoints, different response schema validation, and different processing logic.

calebcartwright avatar May 12 '22 17:05 calebcartwright

Oops, sorry, I didn't read the README closely enough. Indeed http://localhost:8080/docker/v/jauderho/age/latest reproduces. I'll shut up on this issue, sorry for the noise.

(FWIW: +1 for wanting something not broken by cosign: at present, our cosign images are private, but sooner or later we'll be rolling out to public too)

philpennock avatar May 12 '22 17:05 philpennock

No worries! The above next steps still stand if anyone's able to help fill in the gaps.

I also want to reiterate that I view those previously outlined questions/considerations as directly relevant to how we proceed, and not just an exercise in scholastic pedantry. Our version badge, and its ability to identify the correct value, are highly coupled to the arch and digests attributes.

I'm happy to be proven wrong, but I'm not convinced that simply relaxing our schema would just work without some additional code changes. At a minimum I'd like to know that the approach here is the approach being taken for signing and that it won't just be one of many bespoke workarounds people implement in the absence of something more native.

calebcartwright avatar May 12 '22 17:05 calebcartwright

So the basic problem is that "the approach used for signing" is unknowable: the API response returned by Docker for the HTTP call used is a projection of the manifest contents, not the direct manifest contents. So we can't do things like look at the mediaType field to dispatch sanely. It boils down to "how will Docker expose completely legitimate manifest contents". We don't know what decisions Docker will make going forward. Ideally they'd just stop returning the elements with these MIME types for this custom endpoint.

There are competing standards for code-signing, but the cosign approach is gaining significant momentum IMO, in part because it is able to work with agnostic registries such as DockerHub - instead of requiring code changes - because cosign works within the OCI schema system. (Also keyless signing doesn't hurt). The approach needed by the badges code isn't contingent upon the choices of the standard, but the choices of DockerHub.

For code changes, this is what I used to get this repo working for npm run badge -- /docker/v/jauderho/age/latest and serving under npm start; I am not a JavaScript developer and am not familiar with most of the tooling used here, so didn't go into the tests, I just went for logical consistency of "what does it mean if the architecture is gone" ... well we can't assume that the first element of the array is the one we want.

diff --git a/services/docker/docker-version.service.js b/services/docker/docker-version.service.js
index 00bb00e98..88f39f99a 100644
--- a/services/docker/docker-version.service.js
+++ b/services/docker/docker-version.service.js
@@ -17,7 +17,7 @@ const buildSchema = Joi.object({
       images: Joi.array().items(
         Joi.object({
           digest: Joi.string(),
-          architecture: Joi.string().required(),
+          architecture: Joi.string().allow('').required(),
         })
       ),
     })
@@ -92,7 +92,7 @@ export default class DockerVersion extends BaseJsonService {
     let version
 
     if (!tag && sort === 'date') {
-      version = data.results[0].name
+      version = data.results.find(function (el) { return el.architecture != "" }).name
       if (version !== 'latest') {
         return { version }
       }

philpennock avatar May 12 '22 17:05 philpennock

Thanks for continuing to investigate @philpennock, but as you alluded to, there's more that'll be needed than the minimal diff to get one passing badge.

Part of the argument for not trivially dropping the architecture requirement was mentioned above because if so then it opens up a range of other scenarios that have to be accounted for (including some bordering on the pathological). E.g. what if all of the tags have a falsy arch? what if the latest tag (explicitly requested, implicitly via date-driven sort, etc.) is one of the artifact "tags"? what about a semver-bounded tag range with a semver-driven sort? etc.

Again, it's not that we can't do anything on our end. These are all answerable questions/solvable problems, some may be handled already (albeit with subpar errors) and/or just need tests. However, I continue to have objections to a "just do it anyway" strategy though given the outstanding dearth of information that's been shared (beyond a couple helpful but still anecdotal reads) on the relevant upstream factors.

The approach needed by the badges code isn't contingent upon the choices of the standard, but the choices of DockerHub.

Yes, it's true that our DockerHub badges are based on the contracts of the DockerHub APIs as I'd noted earlier in the thread. However, the chain of finger pointing is likely to continue in the interim wherein the DH folks will want to see what's going to happen with the standard before making significant changes to their API surface or internal API implementation.

calebcartwright avatar May 12 '22 20:05 calebcartwright