obsidian-linter icon indicating copy to clipboard operation
obsidian-linter copied to clipboard

FR: `Escape YAML Special Characters` rule should convert invalid YAML into valid one

Open mnaoumov opened this issue 2 years ago • 17 comments

Extracted from discussion for #615

The entire Escape YAML Special Characters has to be rewritten and minimize manual string manipulation and use YAML AST as much as possible. We should use techniques like roundtrip parsing

The case when the string already has both single and double quotes are not processed by the previous implementation. There is even a test case that claims that it produces an invalid YAML

mnaoumov avatar Feb 02 '23 16:02 mnaoumov

@mnaoumov , what is round-trip parsing? Also, how do you parse invalid YAML into an AST?

I am not the most familiar with how these can be done without issue. Sure I can try to parse the error message and the try to escape values that way, but that often fails and would require a try catch which I don't believe is good practice for this kind of thing.

pjkaufman avatar Feb 03 '23 12:02 pjkaufman

@mnaoumov , what is round-trip parsing? Also, how do you parse invalid YAML into an AST?

round-trip parsing is the kind of parsing which preserves whitespaces, comments etc, in the way that you if parse the string and then convert back to string, you will get an identical string back. As far as I see js-yaml doesn't provide such feature

Converting invalid YAML to AST might be a challenging task. I guess it should work similarly like how the compilers deal with the syntactically incorrect code

mnaoumov avatar Feb 03 '23 12:02 mnaoumov

I have no problem with with having a library do this, but I do not plan on writing this logic myself as it seems to me like it would be like writing my own xml or html parser which is generally considered bad practice.

pjkaufman avatar May 17 '23 08:05 pjkaufman

The following package may do round-trip parsing: https://github.com/eemeli/yaml.

pjkaufman avatar May 17 '23 09:05 pjkaufman

Some more information on this:

To use the round trip parser we would need to use use parseDocument. I believe there is also a way to parse a string directly.

Once you haver parsed out the YAML, you get back an object. In that object, you want to access the contents property. It has an array of values, so finding an existing key is a bit trickier (I think there is a good way to do this, but I am not entirely sure what it is off the top of my head).

You should be able to pull back the entirety of a value for an key by checking the value portion for it under its index. It has a range property that has the start and end values (the end seems to be duplicated).

pjkaufman avatar Aug 02 '23 20:08 pjkaufman

@pjkaufman I have an idea that we should stop handcrafting YAML manually, as Obsidian provides a native way to deal with it

processFrontMatter

It has however some minor flaws but I think overall we should stop dealing with YAML as strings and use either native processFrontMatter or underlying js-yaml library

mnaoumov avatar Aug 02 '23 20:08 mnaoumov

That is not testable @mnaoumov . I am a big proponent of being able to test logic that the Linter uses where possible. processFrontMatter is not testable unless you are familiar with a way that would work. I tend to avoid relying on a function in an API staying the same when it comes to testing rules (there are some exceptions).

yaml makes more sense to me on the basis that it allows for round-trip parsing. We could get rid of a lot of the regex and just use the returned object to check for keys, sort things, and the like. The main problem is going to be porting over or getting rid of old functionality. Another benefit is it has an AST that can be manipulated and used for creating a new version of the YAML.

pjkaufman avatar Aug 02 '23 20:08 pjkaufman

The docs for it are here: https://eemeli.org/yaml/#yaml

pjkaufman avatar Aug 02 '23 20:08 pjkaufman

Any thoughts on swapping to yaml as opposed to using js-yaml. @mnaoumov ? Or would we just remove comments from the YAML frontmatter?

Edit: I am trying to get rid of direct grabbing of values for YAML keys and their values via regex, but it is necessary for grabbing the disabled keys and removing hashtags from the tag keys in the YAML.

pjkaufman avatar Aug 02 '23 20:08 pjkaufman

As a note, js-yaml does not allow for specifying and standardizing array formats without always generating new YAML instead of only doing so when needed.

pjkaufman avatar Aug 02 '23 20:08 pjkaufman

@pjkaufman You claim that yaml has a round-trip parser. I don't see it in its documentation. I think we need to test it or look for another library that does round-trip parsing.

If you want me to be in charge of refactoring all YAML-related code to the object model, I am more than happy to be assigned this task to

mnaoumov avatar Aug 02 '23 21:08 mnaoumov

@mnaoumov , round trip parsing is done via the AST using parseDocument (see this comment).

pjkaufman avatar Aug 02 '23 21:08 pjkaufman

It handles position data and comments.

Also, if you feel comfortable refactoring the logic around YAML to not use regex to pull back values, I am fine with passing that to you @mnaoumov .

pjkaufman avatar Aug 02 '23 21:08 pjkaufman

@pjkaufman I also suggest move all linter related settings under one common parent

mnaoumov avatar Aug 03 '23 01:08 mnaoumov

@pjkaufman I also suggest move all linter related settings under one common parent

I don't believe I understand what you mean by this. Could you elaborate? Is this in reference to tabs, types of rules, or something else?

pjkaufman avatar Aug 03 '23 01:08 pjkaufman

@pjkaufman Sorry, I didn't make myself clear

I mean the frontmatter linter settings

---
linter-yaml-title-alias: title1
disabled rules: [capitalize-headings]
---

change to

---
obsidian-linter:
  - yaml-title-alias: title1
  - disabled-rules: [capitalize-headings]
---

mnaoumov avatar Aug 03 '23 05:08 mnaoumov

@pjkaufman Sorry, I didn't make myself clear

I mean the frontmatter linter settings

---
linter-yaml-title-alias: title1
disabled rules: [capitalize-headings]
---

change to

---
obsidian-linter:
  - yaml-title-alias: title1
  - disabled-rules: [capitalize-headings]
---

I am fine with something like that. We would just need a way of migrating old values to their new location.

pjkaufman avatar Aug 03 '23 09:08 pjkaufman