sops icon indicating copy to clipboard operation
sops copied to clipboard

YAML library used

Open felixfontein opened this issue 1 year ago • 4 comments

(Created from #1437 so we can add it to the discussion milestone.)

We're using go-yaml.v3 in SOPS. Unfortunately this library doesn't seem to be actively maintained anymore; the last commit is from May 2022, and there are quite a few bug reports and bugfix PRs that haven't been looked at / haven't progressed, some of them for years. (I got one myself, https://github.com/go-yaml/yaml/pull/690, open since January 2021, last maintainer reaction in May 2021. This is blocking a bugfix on sops's side: https://github.com/getsops/sops/issues/936#issuecomment-917198987)

Two issues have been created in the past in the repository asking whether it's still maintained, and the (single) maintainer always responded that it still is:

  • https://github.com/go-yaml/yaml/issues/788
  • https://github.com/go-yaml/yaml/issues/776

Other projects have actually went on to fork go-yaml locally, like kubernetes-sigs:

  • https://github.com/kubernetes-sigs/yaml/issues/72
  • https://github.com/kubernetes-sigs/yaml/pull/76

Maybe we should also consider switching to that fork? Or is anyone aware of other forks of go-yaml.v3, or even other actively maintained YAML libraries for Go?

felixfontein avatar Sep 14 '24 16:09 felixfontein

There's another YAML library for Golang: https://github.com/goccy/go-yaml It seems to be actively maintained. Maybe we should consider migrating to that one? There seems to be no other actively maintained YAML library; the kubernetes-sigs fork only receives bugfixes the Kubernetes projects needs and only gets new features if the original go-yaml/yaml gets new features (which looks quite unlikely right now), and is not meant to be used by any non-Kubernetes projects.

felixfontein avatar Nov 30 '24 16:11 felixfontein

I did some first experiments. Simply unmarshaling data is very simple, for example the only change needed for config/config.go to use that library is

--- a/config/config.go
+++ b/config/config.go
@@ -19,7 +19,7 @@ import (
        "github.com/getsops/sops/v3/kms"
        "github.com/getsops/sops/v3/pgp"
        "github.com/getsops/sops/v3/publish"
-       "gopkg.in/yaml.v3"
+       "github.com/goccy/go-yaml"
 )
 
 type fileSystem interface {

The error messages also look a lot nicer.

Transforming YAML to SOPS' TreeBranches and vice versa is obviously more work. I've started doing some experiments on how to parse YAML here: https://gist.github.com/felixfontein/f53e704961c3b241810e061083017a5e

So far it looks pretty good. There are two things so far that I found that go-yaml.v3 can handle, and goccy/go-yaml can't:

  • extract comments from flow-style YAML doesn't work yet: (https://github.com/goccy/go-yaml/issues/608)
  • there's no support for dates and timestamps (now that we have #1759...). I also noticed that it parses large integers or floats as strings, similar to how go-yaml.v3 does (which causes all kind of problems). I've created an issue for that: https://github.com/goccy/go-yaml/issues/661

(Also the node structure includes references to the token streams, which I guess would make it possible to implement #1755 for YAML as well, at least from the parsing side - I haven't looked at the emitting side so far.)

felixfontein avatar Feb 16 '25 10:02 felixfontein

Update: I now know how to distinguish between strings and other formats; see https://github.com/goccy/go-yaml/issues/661#issuecomment-2661410461. I've updated https://gist.github.com/felixfontein/f53e704961c3b241810e061083017a5e to print more infos.

felixfontein avatar Feb 16 '25 13:02 felixfontein

With goccy/go-yaml, we could also round-trip the representation of values, like encrypt and decrypt 0x1_5 instead of converting it to 21 (like now).

Something similar could also be done for JSON. (For decoding, one can call d.UseNumber() to ensure that numbers are returned as token.Number, which is basically a string. For encoding, we convert values to strings with json.Marshal(), so that's also easy to handle.)

This would also make it easier to serialize arbitrary data to INI or DotEnv, since we already have a string representation we can simply write into there. (Obviously it won't be roundtrip safe, since loading the INI/DotEnv file will give you a string back, not an integer/float/timestamp/date/...)

This would unfortunately break compatibility with older SOPS versions in the sense that files written with a version supporting this could not be decrypted with an older version.

felixfontein avatar Feb 16 '25 13:02 felixfontein

  • go-yaml.v3 is now officially deprecated: https://github.com/go-yaml/yaml/commit/944c86a7d29391925ed6ac33bee98a0516f1287a
  • there's a new maintained fork of it: https://github.com/yaml/go-yaml/
  • and we have a PR that switches usage to it: #1934 :tada:

I guess the next step would be switching to its v4 branch, since v3 (which we currently use) is basically frozen.

felixfontein avatar Sep 08 '25 19:09 felixfontein