obsidian-linter
obsidian-linter copied to clipboard
FR: `Escape YAML Special Characters` rule should convert invalid YAML into valid one
Extracted from discussion for #615
The entire Escape YAML Special Characters has to be rewritten and minimize manual string manipulation and use YAML AST as much as possible. We should use techniques like roundtrip parsing
The case when the string already has both single and double quotes are not processed by the previous implementation. There is even a test case that claims that it produces an invalid YAML
@mnaoumov , what is round-trip parsing? Also, how do you parse invalid YAML into an AST?
I am not the most familiar with how these can be done without issue. Sure I can try to parse the error message and the try to escape values that way, but that often fails and would require a try catch which I don't believe is good practice for this kind of thing.
@mnaoumov , what is round-trip parsing? Also, how do you parse invalid YAML into an AST?
round-trip parsing
is the kind of parsing which preserves whitespaces, comments etc, in the way that you if parse the string and then convert back to string, you will get an identical string back. As far as I see js-yaml
doesn't provide such feature
Converting invalid YAML to AST might be a challenging task. I guess it should work similarly like how the compilers deal with the syntactically incorrect code
I have no problem with with having a library do this, but I do not plan on writing this logic myself as it seems to me like it would be like writing my own xml or html parser which is generally considered bad practice.
The following package may do round-trip parsing: https://github.com/eemeli/yaml.
Some more information on this:
To use the round trip parser we would need to use use parseDocument
. I believe there is also a way to parse a string directly.
Once you haver parsed out the YAML, you get back an object. In that object, you want to access the contents
property. It has an array of values, so finding an existing key is a bit trickier (I think there is a good way to do this, but I am not entirely sure what it is off the top of my head).
You should be able to pull back the entirety of a value for an key by checking the value portion for it under its index. It has a range
property that has the start and end values (the end seems to be duplicated).
@pjkaufman I have an idea that we should stop handcrafting YAML manually, as Obsidian provides a native way to deal with it
It has however some minor flaws but I think overall we should stop dealing with YAML as strings and use either native processFrontMatter
or underlying js-yaml library
That is not testable @mnaoumov . I am a big proponent of being able to test logic that the Linter uses where possible. processFrontMatter
is not testable unless you are familiar with a way that would work. I tend to avoid relying on a function in an API staying the same when it comes to testing rules (there are some exceptions).
yaml makes more sense to me on the basis that it allows for round-trip parsing. We could get rid of a lot of the regex and just use the returned object to check for keys, sort things, and the like. The main problem is going to be porting over or getting rid of old functionality. Another benefit is it has an AST that can be manipulated and used for creating a new version of the YAML.
The docs for it are here: https://eemeli.org/yaml/#yaml
Any thoughts on swapping to yaml as opposed to using js-yaml. @mnaoumov ? Or would we just remove comments from the YAML frontmatter?
Edit: I am trying to get rid of direct grabbing of values for YAML keys and their values via regex, but it is necessary for grabbing the disabled keys and removing hashtags from the tag keys in the YAML.
As a note, js-yaml
does not allow for specifying and standardizing array formats without always generating new YAML instead of only doing so when needed.
@pjkaufman You claim that yaml has a round-trip parser. I don't see it in its documentation. I think we need to test it or look for another library that does round-trip parsing.
If you want me to be in charge of refactoring all YAML-related code to the object model, I am more than happy to be assigned this task to
@mnaoumov , round trip parsing is done via the AST using parseDocument
(see this comment).
It handles position data and comments.
Also, if you feel comfortable refactoring the logic around YAML to not use regex to pull back values, I am fine with passing that to you @mnaoumov .
@pjkaufman I also suggest move all linter related settings under one common parent
@pjkaufman I also suggest move all linter related settings under one common parent
I don't believe I understand what you mean by this. Could you elaborate? Is this in reference to tabs, types of rules, or something else?
@pjkaufman Sorry, I didn't make myself clear
I mean the frontmatter linter settings
---
linter-yaml-title-alias: title1
disabled rules: [capitalize-headings]
---
change to
---
obsidian-linter:
- yaml-title-alias: title1
- disabled-rules: [capitalize-headings]
---
@pjkaufman Sorry, I didn't make myself clear
I mean the frontmatter linter settings
--- linter-yaml-title-alias: title1 disabled rules: [capitalize-headings] ---
change to
--- obsidian-linter: - yaml-title-alias: title1 - disabled-rules: [capitalize-headings] ---
I am fine with something like that. We would just need a way of migrating old values to their new location.