lifecycle
lifecycle copied to clipboard
Metadata stored as integer is incorrectly restored as a float
Summary
When writing metadata as an integer such as:
[metadata]
version = 1
It will be retrieved from cache as a float incorrectly:
[metadata]
version = 1.0
Example: https://github.com/schneems/cnb-metadata-integer-bug
CNB Metadata integer bug
Expected
If you run this buildpack twice you should see this output on the second run:
Existing layer TOML contents:
[metadata]
version = 1
Actual
It currently instead outputs this:
Existing layer TOML contents:
[metadata]
version = 1.0
Note that this is a float instead of an integer
Reproduce
Clone the code:
git clone https://github.com/schneems/cnb-metadata-integer-bug
cd cnb-metadata-integer-bug
Build once to write metadata:
$ pack build sample --buildpack ./ --path ./
20: Pulling from heroku/buildpacks
Digest: sha256:c49ebf7c333d099ee765a1841acea6d5e5aa0a81a3375e96d7af41c4abc53344
Status: Image is up to date for heroku/buildpacks:20
20-cnb: Pulling from heroku/heroku
Digest: sha256:cb6e1902c1bd7c10e0e0d81d1a6c67a070e7820326b3cd22b11d0847b155ca5a
Status: Image is up to date for heroku/heroku:20-cnb
Previous image with name "sample" not found
===> DETECTING
sample-counter 0.1.0
===> RESTORING
===> BUILDING
===> EXPORTING
Adding layer 'sample-counter:test'
Adding 1/1 app layer(s)
Adding layer 'launcher'
Adding layer 'config'
Adding label 'io.buildpacks.lifecycle.metadata'
Adding label 'io.buildpacks.build.metadata'
Adding label 'io.buildpacks.project.metadata'
no default process type
Saving sample...
*** Images (12e0340bd3e0):
sample
Adding cache layer 'sample-counter:test'
Successfully built image sample
Build it a gain to see what is in the prior metadata:
$ pack build sample --buildpack ./ --path ./
20: Pulling from heroku/buildpacks
Digest: sha256:c49ebf7c333d099ee765a1841acea6d5e5aa0a81a3375e96d7af41c4abc53344
Status: Image is up to date for heroku/buildpacks:20
20-cnb: Pulling from heroku/heroku
Digest: sha256:cb6e1902c1bd7c10e0e0d81d1a6c67a070e7820326b3cd22b11d0847b155ca5a
Status: Image is up to date for heroku/heroku:20-cnb
===> DETECTING
sample-counter 0.1.0
===> RESTORING
Restoring metadata for "sample-counter:test" from app image
Restoring data for "sample-counter:test" from cache
===> BUILDING
Existing layer TOML contents:
[metadata]
version = 1.0
===> EXPORTING
Reusing layer 'sample-counter:test'
Reusing 1/1 app layer(s)
Reusing layer 'launcher'
Reusing layer 'config'
Adding label 'io.buildpacks.lifecycle.metadata'
Adding label 'io.buildpacks.build.metadata'
Adding label 'io.buildpacks.project.metadata'
no default process type
Saving sample...
*** Images (12e0340bd3e0):
sample
Reusing cache layer 'sample-counter:test'
Successfully built image sample
Note the lines:
Existing layer TOML contents:
[metadata]
version = 1.0
You can see in bin/build we are writing an integer, but reading a float.
Problem with strongly typed languages
For dynamic languages, there might be little difference between 1 and 1.0 but in projects like Rust with libcnb.rs it causes bigger problems as deserializing a float and an integer are two distinctly different things so the deserialization library believes the type signature has changed and it will throw out the old cache contents.
While this was originally discovered while working with buildpacks written in Rust with libcnb.rs we reproduced the issue using bash to get a minimum possible reproduction.
Context
lifecycle version
$ pack inspect-builder | grep Lifecycle -A7
Lifecycle:
Version: 0.14.1
Buildpack APIs:
Deprecated: (none)
Supported: 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8
Platform APIs:
Deprecated: (none)
Supported: 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
platform version(s)
$ pack report
Pack:
Version: 0.27.0+git-f4f5be1.build-3382
OS/Arch: darwin/amd64
Default Lifecycle Version: 0.14.1
Supported Platform APIs: 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
Config:
default-builder-image = "[REDACTED]"
[[trusted-builders]]
name = "[REDACTED]"
$ docker info
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc., v0.8.2)
compose: Docker Compose (Docker Inc., v2.6.0)
sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
scan: Docker Scan (Docker Inc., v0.17.0)
Server:
Containers: 119
Running: 0
Paused: 0
Stopped: 119
Images: 73
Server Version: 20.10.16
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
runc version: v1.1.1-0-g52de29d
init version: de40ad0
Security Options:
seccomp
Profile: default
cgroupns
Kernel Version: 5.10.104-linuxkit
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.773GiB
Name: docker-desktop
ID: GLWW:AKWH:SWDV:WU5C:DP5Y:BMKJ:A3MK:HMAP:Z7XK:7G2F:GPRI:UXMY
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: http.docker.internal:3128
HTTPS Proxy: http.docker.internal:3128
No Proxy: hubproxy.docker.internal
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Live Restore Enabled: false
anything else?
Also to add that float calculations can get lossy so if you're doing something like incrementing a counter in metadata and then trying to compare that count it might be that 1.0 != 1.0 due to float-calculation/mantissa issues.
@schneems thanks for reporting this. I did a bit of digging and this looks to be a side-effect of the fact that layer metadata is decoded into a struct field of type interface{}. It looks like our TOML parser (github.com/BurntSushi/toml) helpfully decodes values as an int64 if there is no decimal in the number, so the lifecycle when reading the output file gets the correct value. But then to preserve the information across builds, the lifecycle marshals and then unmarshals the data as JSON (setting and reading the io.buildpacks.lifecycle.metadata label), and encoding/json unmarshals the value as a float64. You can see this all in action here: https://go.dev/play/p/8APQ4q0ngbl
The behavior of encoding/json appears to be "by design" according to the thread here: https://forum.golangbridge.org/t/type-problem-in-json-conversion/19420
I'm not sure the lifecycle would be able to work around this without resorting to a custom json decoder, which I'd be very hesitant to do. Is it possible to work around this in the buildpack?
Thanks for the excellent investigation, @natalieparellano. That all makes sense that saving as JSON and then deserializing would give us floats as per spec. JSON does not have integers (which is mentioned in your link). Now that we know the underlying issue, we need to figure out how to move forward.
Right now, it looks like we need to either update lifecycle somehow or update the spec to explicitly state that it uses a modified TOML format that doesn't support any numeric type other than Float. There might also be other incompatibilities that we've just not come across.
The other option you've mentioned is to re-write a custom JSON parser in Go which certainly sounds like a lot of work. In cases like these trying to brainstorm all possible solutions (no matter how good or bad) might yield something workable.
Brainstorm
Do nothing (i.e., change the spec and workaround the issue in individual buildpacks)
Doing nothing is always an option. In this case, we lose the ability to represent anything that TOML can but JSON can't. We would need to update the spec to clarify that this behavior is intentional. We reference TOML a lot, so maybe a description of "modified TOML" at the top and then replacing all instances of "TOML" with "modified TOML". Then the problem is pushed back to individual buildpacks which need to remember this difference and workaround it where needed.
We would need to go through the TOML spec and check to see if there's any other types that cannot be correctly serialized/deserialized https://toml.io/en/. On the main page it supports:
- basic strings
- multi-line basic strings
- literal strings
- multi-line literal strings
- integers
- hexadecimal with prefix
0x - octal with prefix
0o - binary with prefix
0b - fractional
- exponent
- fractional + exponent
- numbers with separators
- infinity (positive and negative)
- not a number (neutral, positive, and negative)
- offset datetime
- local datetime
- local date
- local time
There's more at https://toml.io/en/v1.0.0
I think this process of updating the spec will take a good bit of work. My preference would be to find a way forward in lifecycle that lets us support all of TOML and not have to touch the spec.
Re-write a custom JSON parser in Go
You mentioned this being an option. It sounds complicated and possibly error-prone. I think we might have the opposite problem with this as well. If someone is using a language like Ruby and was previously expecting a return value to be a float, but it becomes an integer, that could be an issue. For example:
irb(main):003:0> 2.0/22
=> 0.09090909090909091
irb(main):002:0> 2/22
=> 0
Encode the metadata body as a string as well
A co-worker had this idea. If we know that serializing and deserializing to a different format is lossy, maybe we could avoid the lossy by encoding the raw TOML as a string and storing that instead. Something like:
{"metadata": "version = 1\nfoo=\"bar\""}
Target a different marshal format
Could you store the value as TOML instead of JSON? Or perhaps a different marshal format?
I know Ruby has its own marshal format. Maybe Golang has one as well? The one issue I've seen using Ruby's Marshal class is that internal representation can change between versions. While rare, it does happen, so the stability of the marshal format would be a factor.
Maybe marshal can be avoided?
I'm a little fuzzy on the exact implementation details of why marshal is needed. This idea could be wildly off base, but perhaps if metadata is stored and accessed somewhere else, such as on a file on the original layer, it could give us another option.
Something else
There are always more options if we think about them. Maybe something I said sparked an idea, or maybe another reader has something to add/try.
Feedback
Do you have any thoughts or ideas to add to the brainstorming? You've got much more experience and insight into the implementation side of things.
A co-worker pointed out that with a spec update, it would also be best to detect those problem cases and explicitly error on them. Raising something like this for each incompatible type:
Error: Incompatible TOML type Integer used
Cloud Native Buildpacks do not support the full TOML spec. For more information, see <link>
Essentially, a spec update would also ideally be tightly coupled to an update in lifecycle.
(For anyone else that didn't see this on Slack)
Discussion on Slack: https://cloud-native.slack.com/archives/C0333LG1E68/p1658434045913159
...which concluded that we'll likely want an RFC for this: https://cloud-native.slack.com/archives/C0333LG1E68/p1658501272403839?thread_ts=1658434382.039529&cid=C0333LG1E68
To recap what's talked about there. As I understand it, the metadata is serialized to JSON and returned as a label in the platform spec https://github.com/buildpacks/spec/blob/edac0f7214646018d6fb78267b1fad6c3d23b5b0/platform.md#iobuildpackslifecyclemetadata-json.
This decision to store it in labels can cause issues beyond the lossy-float issue, as really large labels can crash K8 nodes. It was discussed to move storing this data in the label and into a file instead, such as <layers>/config/<buildpack ID>/<layer>.toml. An example of a similar move can be found in this RFC for SBOM https://github.com/buildpacks/rfcs/blob/main/text/0095-sbom.md.
We need an RFC to move forward as changing this will be a breaking change for the platform spec, but it seems worth doing (in my opinion).