Better YAML Front Matter parsing
Is your feature request related to a problem? Please describe.
I'm building a project, where the important part is parsing the YAML front matter in Markdown documents and map it to known DTO-s. I experienced multiple issues with the existing front matter extension for commonmark:
- it supports only a small subset of YAML, unsuitable for my needs,
- it produces the data structure that breaks Jackson when trying to map it to a DTO. The main issue is that
key: valuepair is mapped to a list with a single value. It also does not support other data types.
Describe the solution you'd like
I think that it would work much better, if YAML parsing were delegated to an external library. The extension could simply allow to "bring your own parser" via an interface to implement. In case of Jackson, this translates to calling a single method on ObjectMapper:
Pseudo-code that illustrates the idea:
public class JacksonWrapper implements YamlParserProvider {
private final ObjectMapper objectMapper;
public JacksonWrapper(ObjectMapper objectMapper) {
this.objectMapper = objectMapper;
}
public Map<String, Object> parseYaml(String input) {
return objectMapper.readValue(input, Map.class);
}
}
// ...
var parser = Parser.builder()
.extensions(List.of(YamlFrontMatterExtension(JacksonWrapper(objectMapper))))
.build();
In this way, commonmark does not have to have a dependency on any particular YAML parser.
Returning Map<String, Object> is IMHO good enough, because I can later use Jackson to map it to the proper DTO in my logic.
Describe alternatives you've considered
Currently, I wrote a simple custom extension on my own. The thing that I don't have yet is better error handling, but the approach seems to work. But I would be happy, if such an extension were a part of Commonmark (less boilerplate code to maintain).
I briefly considered Commonmark to improve the parser in the existing extension, but I quickly realized that this is a complex task.
Yeah, I agree we should expose the raw YAML somehow. How about just exposing a String which is the whole front matter content?
That would also work - the only challenge I see would be literal blocks that may contain --- sequence as the first symbols in the line. You would be able to run the YAML parser only after processing of the entire Markdown document, so either you accept this trade-off or implement some minimal support just for literal blocks in YAML Front Matter extension.
But for me this is a corner case, and I could probably live with it.
Can you explain a bit more what you mean? What's a literal block?