github-workflows-kt icon indicating copy to clipboard operation
github-workflows-kt copied to clipboard

Improve and centralize YAML serialization

Open krzema12 opened this issue 2 years ago • 10 comments

Currently the library contains logic for serializing to YAML in multiple places. Besides mixing layers (business logic vs serialization), this is error-prone because e.g. fixing an issue in one place may lead to not fixing it in other relevant contexts.

Here's a list of problems that appeared because of incorrect YAML serialization:

  • #36
  • #70
  • #89 (confusion during development how to make the indent correct)
  • #369
  • #370

krzema12 avatar Feb 14 '22 07:02 krzema12

We should consider going back to kotlinx.serialization + kaml or equivalent, for the sake of delegating the problem of YAML synthesis. It's getting more and more tedious to generate YAML by hand. FYI @Vampire

krzema12 avatar Jun 18 '22 14:06 krzema12

After thinking a little, kaml doesn't seem to be the optimal choice for now. Here's why:

  • it includes the type field to handle polymorphism which we don't actually need, but GitHub doesn't like it in the YAML
  • for custom arguments, I don't see a way to have e.g. some data class serialized, with a map of String to Any added as extra properties

That's why I'm considering reimplementing the domain-to-YAML layer to use snakeyaml. It's used under the hood in kaml, and it's already used in serializing custom arguments. The idea is to have the whole domain object converted to maps, lists and primitive types, and have it serialized by snakeyaml. Thanks to this, the problem of creating a valid YAML will be delegated to the library, and this DSL library will get much simpler. The side-effect will be that YAML formatting will change a little.

FYI @jmfayard @Vampire - I'm open to having a discussion about the new approach to YAML emitting. It's quite a big effort, so while working on it, I'd like to have one or two acks that the idea makes sense :smile:

krzema12 avatar Jul 31 '22 21:07 krzema12

it includes the type field to handle polymorphism which we don't actually need, but GitHub doesn't like it in the YAML

Well, that's because polymorphism is not what you need or want here. Instead you want contextual serialization / deserialization. The kotlinx.serialization DAOs for GitHub actions an GitHub workflows I showed you are fully working to parse any possible GitHub action file and any possible GitHub workflow file including any possible documented variation.

And it also fully supports producing the according YAML that GitHub likes without any superfluous keys or values. Including using the minimal form possible, like having a scalar value like

on: push

a list like

on:
    - push
    - fork

or a map like a list like

on:
    push:
        branches: main
    fork:

Only drawback of my current version is, that they are still for kotlinx.serialization 0.x, but it would probably not be too hard to port them over to the latest versions.

for custom arguments, I don't see a way to have e.g. some data class serialized, with a map of String to Any added as extra properties

Not fully sure what you mean. If you can build the DAO classes for the serialization, the serialization should work. And if the DAO classes support all valid cases, it should probably work just fine.

That's why I'm considering reimplementing the domain-to-YAML layer to use snakeyaml. It's used under the hood in kaml, and it's already used in serializing custom arguments. The idea is to have the whole domain object converted to maps, lists and primitive types, and have it serialized by snakeyaml. Thanks to this, the problem of creating a valid YAML will be delegated to the library, and this DSL library will get much simpler.

Well, if you prefer to reinvent the wheel, feel free to do so. Any solution that does not mean you do manual text concatenation should be better than the current situation. :-) My offer to refactor out my serializers into a stand-alone lib you can depend on and use is still standing.

Vampire avatar Jul 31 '22 21:07 Vampire

Thanks! FTR, here are the DAO classes with custom serializers: https://github.com/Vampire/setup-wsl/tree/master/buildSrc/src/main/kotlin/net/kautler/dao I have to finally find time to wrap my head around it :smile:

Regarding supporting custom arguments, I'd like to make sure that once the client calls e. g. workflow DSL builder with _customArguments = mapOf("new-workflow-argument" to 123), the library can emit this argument as any regular workflow's YAML attribute. With simple data classes without custom serializers, I don't see a way to achieve it.

The next step would be to prepare a PoC using kotlinx.serialization approach that covers this case. @Vampire if you would find a moment to prepare such example, it would be a great piece of help to me as it would bring us closer to the decision which way to go.

Of course we can always say that if the DAOs are mature enough, such feature is not needed. However I'd like to check out this case anyway, since I want to avoid deciding on removing library's feature prematurely. Even with most mature DAOs, GitHub can still add new attributes.

krzema12 avatar Aug 01 '22 06:08 krzema12

I did a quick and dirty version of Vampire's custom serializer in the script-generator. Might make sense to migrate them too, but first the library.

jmfayard avatar Aug 01 '22 06:08 jmfayard

@jmfayard feel free to share it on a branch.


Regarding the idea of using snakeyaml, I prepared such PoC depicting what I mean by being able to define _customArguments:

package it.krzeminski.githubactions.yaml

import org.yaml.snakeyaml.Yaml

// Domain class
data class Workflow(
    val knownAttribute1: String,
    val knownAttribute2: Map<String, List<String>>,
    val attributeToOverwrite: String,
    val _customArguments: Map<String, Any>,
)

fun main() {
    val myWorkflow = Workflow(
        knownAttribute1 = "hello",
        knownAttribute2 = mapOf(
            "foo" to listOf("bar", "baz"),
            "yet" to listOf("another", "key"),
        ),
        attributeToOverwrite = "overwrite-me",
        _customArguments = mapOf(
            "unknown-attribute1" to 123,
            "unknown-attribute2" to listOf("foo", "bar", "baz"),
            "attribute-to-overwrite" to "overwritten!",
        ),
    )

    val workflowAsYaml = myWorkflow.toYaml()
    println(workflowAsYaml)

    // Expected:
    //
    // known-attribute-1: hello
    // known-attribute-2:
    //   foo: [bar, baz]
    //   yet: [another, key]
    // attribute-to-overwrite: overwritten!
    // unknown-attribute1: 123
    // unknown-attribute2: [foo, bar, baz]
}

private fun Workflow.toYaml(): String {
    val yamlable = mapOf(
        "known-attribute-1" to knownAttribute1,
        "known-attribute-2" to knownAttribute2,
        "attribute-to-overwrite" to attributeToOverwrite,
    ) + _customArguments
    return Yaml().dump(yamlable)
}

It would be cool to see a piece of code using kotlinx.serialization+kaml approach that would let us achieve the same.

krzema12 avatar Aug 01 '22 21:08 krzema12

https://github.com/sksamuel/hoplite

Hoplite also supports YAML, perhaps it's a viable approach

LeoColman avatar Aug 03 '22 17:08 LeoColman

@LeoColman

Do you have capacity to prepare a short PoC like mine?

krzema12 avatar Aug 03 '22 17:08 krzema12

I think you can do a Workflow -> JsonElement conversion, then a mapping from JsonElement -> JsonElement using the values from _customArgument, finally a JsonElement -> Kaml

jmfayard avatar Aug 03 '22 20:08 jmfayard

@jmfayard JsonElement - typo? I don't quite get it, PoC please 🙏 😄 I'm also curious how it would differ from the snakeyaml-like approach.

krzema12 avatar Aug 04 '22 05:08 krzema12

sorry but I'm doing too much things at the moment

jmfayard avatar Aug 06 '22 10:08 jmfayard

I'm gonna go the snakeyaml approach for now since

  • it fulfills the current API's needs
  • is simple to understand
  • requires no extra dependencies
  • will address several issues with the lib
  • formatting of the output is in line with kotlinx.serialization (kx.s uses snakeyaml under the hood), so if we ever decide to go this way, formatting changes in the output YAML should be none or minimal
  • gives fine-grained control on e.g. where to displays nulls explicitly, and where to display no property instead

Later on, we can always iteratively improve on it further.

krzema12 avatar Aug 06 '22 10:08 krzema12

Working on https://github.com/krzema12/github-actions-kotlin-dsl/issues/87 first, to better catch regressions that may result from changing the YAML emitting method.

krzema12 avatar Aug 07 '22 19:08 krzema12