jackson-dataformats-text icon indicating copy to clipboard operation
jackson-dataformats-text copied to clipboard

YAML: Allow handling of custom tags

Open nictas opened this issue 4 years ago • 4 comments

We're trying to migrate from SnakeYAML to jackson-dataformat-yaml, but there's one thing that's currently stopping us. We parse YAML documents and allow our users to mark fields as sensitive, so that they're handled in a safe way. For example:

modules:
  - name: foo
    parameters:
      username: someone
      password: !sensitive Abcd1234

Note that the value of the password field is marked with a custom YAML tag - !sensitive.

We handle this custom tag with the following code:

import org.yaml.snakeyaml.constructor.SafeConstructor;
import org.yaml.snakeyaml.nodes.Tag;

public class YamlTaggedObjectsConstructor extends SafeConstructor {

    private static final String SENSITIVE_TAG = "!sensitive";

    public YamlTaggedObjectsConstructor() {
        this.yamlConstructors.put(new Tag(SENSITIVE_TAG), new SecureConstruct());
    }
}

Where SecureConstruct is our custom implementation of org.yaml.snakeyaml.constructor.AbstractConstruct that knows how to parse values marked with !sensitive.

Is there any chance you plan to add support for deserializers for custom tags or are they something you consider out-of-scope?

nictas avatar Aug 06 '20 10:08 nictas

I think that should be in-scope, ideally, similar to how anchors can be accessed via YAMLParser (although for those, general-purpose getObjectId() from JsonParser is used).

In theory there is already JsonParser.getTypeId(), and that is wired to expose some of tags, but it looks like it may do too much pre-processing to work. So maybe addition getRawTag() (or whatever name makes sense) in YAMLParser would make sense? Being API addition, that needs to go in next minor version, 2.12.0, and I would be happy to get a PR if you or anyone else was interested in contribution?

I realize that this would be just part of the challenge, as custom deserializers would need to use it. But at least it would make this possible.

There may be, come to think of it now, other challenges wrt buffering (if content needs to be read in different order, for Creator parameters for example). So alternatively having a setting to expose ALL tags via getTypeId() might be better -- object and type ids ("native" ones) are retained by TokenBuffer.

cowtowncoder avatar Aug 19 '20 21:08 cowtowncoder

I believe that getRawTag() is a good workaround given though it doesn't cover the buffering use-cases. I would love to try out!

buremba avatar Apr 13 '21 14:04 buremba

Hey, any update about this use case? Will this feature be planned? There are very interesting possibilities of using this functionality https://stackoverflow.com/questions/68782030/build-custom-snakeyaml-constructor-to-deserialize-yaml-file-in-a-modular-way

awattez avatar Feb 28 '23 09:02 awattez

@awattez This would require external contribution; my time is too limited to work on this unfortunately.

Scope of work sounds pretty extensive, fwtw; but perhaps one could first expose means to add custom tags on output (serialization), then some hooks on deserialization. But I don't really have a solid idea of how this could be done end-to-end; challenge is not so much getting/putting tags via snakeyaml but rather how to make them work for Databinding; what to expose as -- most existing constructors are for things JSON has either natively, or by some sort of configuration (Type and Object Ids).

cowtowncoder avatar Feb 28 '23 18:02 cowtowncoder