uniorg icon indicating copy to clipboard operation
uniorg copied to clipboard

Keep the original letter-casing of keywords during the parsing phase

Open Delapouite opened this issue 2 years ago • 3 comments

Hi

The org syntax allow to use both UPPERCASE and lowercase keywords. Example:

#+PROPERTIES
…
#+END

versus

#+properties
…
#+end

Currently, the parser forces the UPPERCASE output :

https://github.com/rasendubi/uniorg/blob/a74a80bedb41cdc3190bec1feb09f5b19c3c63ed/packages/uniorg-parse/src/parser.ts#L1030

I understand that in a way this step can be beneficial to homogenize down the process pipeline.

But in situation where lots of org documents have been authored with the lowercase style, it means that in the case of pipeline doing read org files → parse them → do stuff → stringify → overwrite the file, this change of cosmetic style introduces a lot of noise, especially in diffs if the org files are versioned with git by examples.

Do you think we could keep the current behavior by default but add a new option to keep the case as authored in the original doc?

Thanks!

Delapouite avatar Mar 23 '23 12:03 Delapouite

Do you necessarily need to preserve the original spelling?

Would having an option in uniorg-stringify to select uppercase/lowercase spelling work for you?

I'm just worrying that allowing any case would complicate the processing and plugins. Besides upper- and lowercase, any mix is allowed (#+Title, #+tiTLe), so all processors would have to take that into account

rasendubi avatar Mar 23 '23 17:03 rasendubi

I was not aware of the mixed-case possibilities. So I think you're right. Focusing on either upper or lowercase choice should already by a good enough option. Thanks

Delapouite avatar Mar 23 '23 20:03 Delapouite

Just checked and the current behavior is also consistent with org-elements (the reference parser in emacs-lisp).

Given the following org document:

#+test: blah

it produces the following AST:

((section
  (:begin 1 :end 13 :mode first-section :granularity nil)
  (keyword
   (:key "TEST" :value "blah" :mode top-comment :granularity nil))))

The lower-casing can be implemented in two ways: as a unified plugin (that traverses all keywords and lower-cases keys) or as a configuration for uniorg-stringify.

The plugin could go like this:

unified()
  .use(uniorgParse)
  .use(otherPlugins)
  // This plugin should be added immediately before
  // uniorg-stringify to not mess up with other plugins.
  .use(() => (tree) => {
    // visit from unist-util-visit
    visit(tree, 'keyword', (keyword) => {
      keyword.key = keyword.key.toLowerCase();
    });
  })
  .use(uniorgStringify)

Adjusting uniorg-stringify is obviously more involved. Especially because it's currently lacking in options handling. Though if we implement handlers as in uniorg-rehype, that makes it much more powerful and I'm willing to accept a PR

rasendubi avatar Mar 24 '23 02:03 rasendubi