lifecycle icon indicating copy to clipboard operation
lifecycle copied to clipboard

Metadata stored as integer is incorrectly restored as a float

Open schneems opened this issue 3 years ago • 6 comments

Summary

When writing metadata as an integer such as:

[metadata]
version = 1

It will be retrieved from cache as a float incorrectly:

[metadata]
version = 1.0

Example: https://github.com/schneems/cnb-metadata-integer-bug


CNB Metadata integer bug

Expected

If you run this buildpack twice you should see this output on the second run:

Existing layer TOML contents:
[metadata]
  version = 1

Actual

It currently instead outputs this:

Existing layer TOML contents:
[metadata]
  version = 1.0

Note that this is a float instead of an integer

Reproduce

Clone the code:

git clone https://github.com/schneems/cnb-metadata-integer-bug
cd cnb-metadata-integer-bug

Build once to write metadata:

$ pack build sample --buildpack ./ --path ./
20: Pulling from heroku/buildpacks
Digest: sha256:c49ebf7c333d099ee765a1841acea6d5e5aa0a81a3375e96d7af41c4abc53344
Status: Image is up to date for heroku/buildpacks:20
20-cnb: Pulling from heroku/heroku
Digest: sha256:cb6e1902c1bd7c10e0e0d81d1a6c67a070e7820326b3cd22b11d0847b155ca5a
Status: Image is up to date for heroku/heroku:20-cnb
Previous image with name "sample" not found
===> DETECTING
sample-counter 0.1.0
===> RESTORING
===> BUILDING
===> EXPORTING
Adding layer 'sample-counter:test'
Adding 1/1 app layer(s)
Adding layer 'launcher'
Adding layer 'config'
Adding label 'io.buildpacks.lifecycle.metadata'
Adding label 'io.buildpacks.build.metadata'
Adding label 'io.buildpacks.project.metadata'
no default process type
Saving sample...
*** Images (12e0340bd3e0):
      sample
Adding cache layer 'sample-counter:test'
Successfully built image sample

Build it a gain to see what is in the prior metadata:

$ pack build sample --buildpack ./ --path ./
20: Pulling from heroku/buildpacks
Digest: sha256:c49ebf7c333d099ee765a1841acea6d5e5aa0a81a3375e96d7af41c4abc53344
Status: Image is up to date for heroku/buildpacks:20
20-cnb: Pulling from heroku/heroku
Digest: sha256:cb6e1902c1bd7c10e0e0d81d1a6c67a070e7820326b3cd22b11d0847b155ca5a
Status: Image is up to date for heroku/heroku:20-cnb
===> DETECTING
sample-counter 0.1.0
===> RESTORING
Restoring metadata for "sample-counter:test" from app image
Restoring data for "sample-counter:test" from cache
===> BUILDING
Existing layer TOML contents:
[metadata]
  version = 1.0
===> EXPORTING
Reusing layer 'sample-counter:test'
Reusing 1/1 app layer(s)
Reusing layer 'launcher'
Reusing layer 'config'
Adding label 'io.buildpacks.lifecycle.metadata'
Adding label 'io.buildpacks.build.metadata'
Adding label 'io.buildpacks.project.metadata'
no default process type
Saving sample...
*** Images (12e0340bd3e0):
      sample
Reusing cache layer 'sample-counter:test'
Successfully built image sample

Note the lines:

Existing layer TOML contents:
[metadata]
  version = 1.0

You can see in bin/build we are writing an integer, but reading a float.

Problem with strongly typed languages

For dynamic languages, there might be little difference between 1 and 1.0 but in projects like Rust with libcnb.rs it causes bigger problems as deserializing a float and an integer are two distinctly different things so the deserialization library believes the type signature has changed and it will throw out the old cache contents.

While this was originally discovered while working with buildpacks written in Rust with libcnb.rs we reproduced the issue using bash to get a minimum possible reproduction.


Context

lifecycle version
$ pack inspect-builder | grep Lifecycle -A7
Lifecycle:
  Version: 0.14.1
  Buildpack APIs:
    Deprecated: (none)
    Supported: 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8
  Platform APIs:
    Deprecated: (none)
    Supported: 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
platform version(s)
$ pack report
Pack:
  Version:  0.27.0+git-f4f5be1.build-3382
  OS/Arch:  darwin/amd64

Default Lifecycle Version:  0.14.1

Supported Platform APIs:  0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9

Config:
  default-builder-image = "[REDACTED]"

  [[trusted-builders]]
    name = "[REDACTED]"
$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.8.2)
  compose: Docker Compose (Docker Inc., v2.6.0)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 119
  Running: 0
  Paused: 0
  Stopped: 119
 Images: 73
 Server Version: 20.10.16
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
 runc version: v1.1.1-0-g52de29d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.104-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 7.773GiB
 Name: docker-desktop
 ID: GLWW:AKWH:SWDV:WU5C:DP5Y:BMKJ:A3MK:HMAP:Z7XK:7G2F:GPRI:UXMY
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Live Restore Enabled: false
anything else?

schneems avatar Jul 19 '22 19:07 schneems

Also to add that float calculations can get lossy so if you're doing something like incrementing a counter in metadata and then trying to compare that count it might be that 1.0 != 1.0 due to float-calculation/mantissa issues.

schneems avatar Jul 21 '22 16:07 schneems

@schneems thanks for reporting this. I did a bit of digging and this looks to be a side-effect of the fact that layer metadata is decoded into a struct field of type interface{}. It looks like our TOML parser (github.com/BurntSushi/toml) helpfully decodes values as an int64 if there is no decimal in the number, so the lifecycle when reading the output file gets the correct value. But then to preserve the information across builds, the lifecycle marshals and then unmarshals the data as JSON (setting and reading the io.buildpacks.lifecycle.metadata label), and encoding/json unmarshals the value as a float64. You can see this all in action here: https://go.dev/play/p/8APQ4q0ngbl

The behavior of encoding/json appears to be "by design" according to the thread here: https://forum.golangbridge.org/t/type-problem-in-json-conversion/19420

I'm not sure the lifecycle would be able to work around this without resorting to a custom json decoder, which I'd be very hesitant to do. Is it possible to work around this in the buildpack?

natalieparellano avatar Jul 21 '22 16:07 natalieparellano

Thanks for the excellent investigation, @natalieparellano. That all makes sense that saving as JSON and then deserializing would give us floats as per spec. JSON does not have integers (which is mentioned in your link). Now that we know the underlying issue, we need to figure out how to move forward.

Right now, it looks like we need to either update lifecycle somehow or update the spec to explicitly state that it uses a modified TOML format that doesn't support any numeric type other than Float. There might also be other incompatibilities that we've just not come across.

The other option you've mentioned is to re-write a custom JSON parser in Go which certainly sounds like a lot of work. In cases like these trying to brainstorm all possible solutions (no matter how good or bad) might yield something workable.

Brainstorm

Do nothing (i.e., change the spec and workaround the issue in individual buildpacks)

Doing nothing is always an option. In this case, we lose the ability to represent anything that TOML can but JSON can't. We would need to update the spec to clarify that this behavior is intentional. We reference TOML a lot, so maybe a description of "modified TOML" at the top and then replacing all instances of "TOML" with "modified TOML". Then the problem is pushed back to individual buildpacks which need to remember this difference and workaround it where needed.

We would need to go through the TOML spec and check to see if there's any other types that cannot be correctly serialized/deserialized https://toml.io/en/. On the main page it supports:

  • basic strings
  • multi-line basic strings
  • literal strings
  • multi-line literal strings
  • integers
  • hexadecimal with prefix 0x
  • octal with prefix 0o
  • binary with prefix 0b
  • fractional
  • exponent
  • fractional + exponent
  • numbers with separators
  • infinity (positive and negative)
  • not a number (neutral, positive, and negative)
  • offset datetime
  • local datetime
  • local date
  • local time

There's more at https://toml.io/en/v1.0.0

I think this process of updating the spec will take a good bit of work. My preference would be to find a way forward in lifecycle that lets us support all of TOML and not have to touch the spec.

Re-write a custom JSON parser in Go

You mentioned this being an option. It sounds complicated and possibly error-prone. I think we might have the opposite problem with this as well. If someone is using a language like Ruby and was previously expecting a return value to be a float, but it becomes an integer, that could be an issue. For example:

irb(main):003:0> 2.0/22
=> 0.09090909090909091
irb(main):002:0> 2/22
=> 0

Encode the metadata body as a string as well

A co-worker had this idea. If we know that serializing and deserializing to a different format is lossy, maybe we could avoid the lossy by encoding the raw TOML as a string and storing that instead. Something like:

{"metadata": "version = 1\nfoo=\"bar\""}

Target a different marshal format

Could you store the value as TOML instead of JSON? Or perhaps a different marshal format?

I know Ruby has its own marshal format. Maybe Golang has one as well? The one issue I've seen using Ruby's Marshal class is that internal representation can change between versions. While rare, it does happen, so the stability of the marshal format would be a factor.

Maybe marshal can be avoided?

I'm a little fuzzy on the exact implementation details of why marshal is needed. This idea could be wildly off base, but perhaps if metadata is stored and accessed somewhere else, such as on a file on the original layer, it could give us another option.

Something else

There are always more options if we think about them. Maybe something I said sparked an idea, or maybe another reader has something to add/try.

Feedback

Do you have any thoughts or ideas to add to the brainstorming? You've got much more experience and insight into the implementation side of things.

schneems avatar Jul 21 '22 18:07 schneems

A co-worker pointed out that with a spec update, it would also be best to detect those problem cases and explicitly error on them. Raising something like this for each incompatible type:

Error: Incompatible TOML type Integer used

Cloud Native Buildpacks do not support the full TOML spec. For more information, see <link>

Essentially, a spec update would also ideally be tightly coupled to an update in lifecycle.

schneems avatar Jul 21 '22 18:07 schneems

(For anyone else that didn't see this on Slack)

Discussion on Slack: https://cloud-native.slack.com/archives/C0333LG1E68/p1658434045913159

...which concluded that we'll likely want an RFC for this: https://cloud-native.slack.com/archives/C0333LG1E68/p1658501272403839?thread_ts=1658434382.039529&cid=C0333LG1E68

edmorley avatar Jul 25 '22 15:07 edmorley

To recap what's talked about there. As I understand it, the metadata is serialized to JSON and returned as a label in the platform spec https://github.com/buildpacks/spec/blob/edac0f7214646018d6fb78267b1fad6c3d23b5b0/platform.md#iobuildpackslifecyclemetadata-json.

This decision to store it in labels can cause issues beyond the lossy-float issue, as really large labels can crash K8 nodes. It was discussed to move storing this data in the label and into a file instead, such as <layers>/config/<buildpack ID>/<layer>.toml. An example of a similar move can be found in this RFC for SBOM https://github.com/buildpacks/rfcs/blob/main/text/0095-sbom.md.

We need an RFC to move forward as changing this will be a breaking change for the platform spec, but it seems worth doing (in my opinion).

schneems avatar Jul 25 '22 17:07 schneems