Org.jl icon indicating copy to clipboard operation
Org.jl copied to clipboard

Should Org.jl ever throw ERRORs on parsing?

Open schoettl opened this issue 2 years ago • 3 comments

Is Org.jl is intended to throw errors sometimes instead of warnings?

I think it would be more useful to always parse successfully and only show warnings for suspicious org code. For example if Org.jl was used as a renderer at GitHub to render README.org, it wouldn't render at all if a developer makes a slight syntax error. I.e. a broken link like [[\\[]] could be parsed as text and only show a warning.

I ran some tests with org fragments that we used in unit tests in org-parser. There are a number of ERRORs if you're interested. Some of the tests is malformed org.

#+begin_src sh :results verbatim
  curl https://raw.githubusercontent.com/200ok-ch/org-parser/master/test/org_parser/parser_test.cljc \
    | sed -nr 's/.*parse ("[^"]+").*/\1/p' \
    | awk '{printf "parse(OrgDoc, %s)\n", $0}'
#+end_src

#+RESULTS:
#+begin_example
parse(OrgDoc, "a")
parse(OrgDoc, "ab ")
parse(OrgDoc, "a\n")
parse(OrgDoc, ":a:")
parse(OrgDoc, ":a:b:c:")
parse(OrgDoc, ":az:AZ:09:_@#%:")
parse(OrgDoc, "* hello world")
parse(OrgDoc, "** [#A] hello world")
parse(OrgDoc, "*** hello world :the:end:")
parse(OrgDoc, "**** [#B] hello world :the:end:")
parse(OrgDoc, "* a\nb")
parse(OrgDoc, "* TODO hello world")
parse(OrgDoc, "* TODO COMMENT hello world")
parse(OrgDoc, "***** COMMENT hello world")
parse(OrgDoc, "* hello\n  CLOSED: [2021-05-22 Sat]")
parse(OrgDoc, "-----")
parse(OrgDoc, " --------")
parse(OrgDoc, "#+KEY: VALUE")
parse(OrgDoc, "#")
parse(OrgDoc, "# ")
parse(OrgDoc, "# comment")
parse(OrgDoc, "\t# comment")
parse(OrgDoc, "#comment")
parse(OrgDoc, "#\tcomment")
parse(OrgDoc, "anything\ngoes")
parse(OrgDoc, "#+HEADER: hello world")
parse(OrgDoc, "#+NAME: hello world")
parse(OrgDoc, "#+PLOT: hello world")
parse(OrgDoc, "#+RESULTS: hello world")
parse(OrgDoc, "#+RESULTS[asdf]: hello world")
parse(OrgDoc, "#+CAPTION: hello world")
parse(OrgDoc, "#+CAPTION[qwerty]: hello world")
parse(OrgDoc, "#+TODO: TODO | DONE")
parse(OrgDoc, "#+BEGIN_center params! \n#+end_center")
parse(OrgDoc, "#+BEGIN_QUOTE \ncontent\n#+end_QUOTE ")
parse(OrgDoc, "#+BEGIN_center\nmy\ncontent\n#+end_center")
parse(OrgDoc, "#+BEGIN_one\n#+end_other")
parse(OrgDoc, "src")
parse(OrgDoc, "#+BEGIN_src\n#+END_src")
parse(OrgDoc, "src")
parse(OrgDoc, "#+BEGIN_src\n\n#+END_src")
parse(OrgDoc, "src")
parse(OrgDoc, "#+BEGIN_src\ncontent\n #+END_src")
parse(OrgDoc, "src")
parse(OrgDoc, "#+BEGIN_src\ncontent\n second line \n #+END_src")
parse(OrgDoc, "example")
parse(OrgDoc, "#+BEGIN_example params! \n#+end_example")
parse(OrgDoc, "src")
parse(OrgDoc, "#+BEGIN_src \ncontent\n#+end_src ")
parse(OrgDoc, "export")
parse(OrgDoc, "#+BEGIN_export\nmy\ncontent\n#+end_export")
parse(OrgDoc, "comment")
parse(OrgDoc, "#+BEGIN_comment\n#+end_other")
parse(OrgDoc, "#+BEGIN_CENTER some params")
parse(OrgDoc, "#+END_CENTER")
parse(OrgDoc, "#+BEGIN: na.me pa rams \n#+end:")
parse(OrgDoc, "#+BEGIN: name \ntext\n#+end: ")
parse(OrgDoc, "#+BEGIN: name \n#+end:\n#+end:")
parse(OrgDoc, "#+begin: abc \nmulti\nline\ncontent\n#+end: ")
parse(OrgDoc, ":SOMENAME:")
parse(OrgDoc, ":END:")
parse(OrgDoc, ":PROPERTIES:\n:foo: bar\n:END:")
parse(OrgDoc, ":MYDRAWER:\nany\ntext\n:END:")
parse(OrgDoc, ":PROPERTIES:\n:END:")
parse(OrgDoc, ":PROPERTIES:\n:text+: my value\n:END:")
parse(OrgDoc, ":PROPERTIES:\n:text+: my value\n:PRO: abc\n:END:")
parse(OrgDoc, ":PROPERTIES:\ntext\n:END:")
parse(OrgDoc, "#+BEGIN: SOMENAME some params")
parse(OrgDoc, "#+END:")
parse(OrgDoc, "[fn:some-label] some contents")
parse(OrgDoc, "[fn:123] some contents")
parse(OrgDoc, "[123] some contents")
parse(OrgDoc, "[fn:123]")
parse(OrgDoc, "[fn::some contents]")
parse(OrgDoc, "[fn:some-label:some contents]")
parse(OrgDoc, "[fn:123:some contents]")
parse(OrgDoc, "[fn:some-label:some [contents]")
parse(OrgDoc, "[fn:some-label:some ]contents]")
parse(OrgDoc, "* a simple list item")
parse(OrgDoc, "- a simple list item")
parse(OrgDoc, "+ a simple list item")
parse(OrgDoc, "1. a simple list item")
parse(OrgDoc, "1) a simple list item")
parse(OrgDoc, "a) a simple list item")
parse(OrgDoc, "A) a simple list item")
parse(OrgDoc, "- [X] a simple list item")
parse(OrgDoc, " * a tag :: a simple list item")
parse(OrgDoc, "- [X] a tag :: a simple list item")
parse(OrgDoc, "#+HELLO: hello world")
parse(OrgDoc, ":HELLO:")
parse(OrgDoc, ":HELLO+:")
parse(OrgDoc, ":HELLO: hello world")
parse(OrgDoc, ":HELLO+: hello world")
parse(OrgDoc, "<%%(( <(sexp)().))>")
parse(OrgDoc, "<2020-01-18>")
parse(OrgDoc, "<2020-01-18 Sat>")
parse(OrgDoc, "<2020-01-21 Di>")
parse(OrgDoc, "<2020-01-21 Dönerstag>")
parse(OrgDoc, "<2020-01-18 12:00>")
parse(OrgDoc, "<2020-01-18 Sat 12:00>")
parse(OrgDoc, "<2020-01-18 Sat 12:00:00>")
parse(OrgDoc, "<2020-01-18 +1w>")
parse(OrgDoc, "<2020-01-18 -2d>")
parse(OrgDoc, "<2020-01-18 +1w -2d>")
parse(OrgDoc, "<2020-01-18 -2d +1w>")
parse(OrgDoc, "<2020-01-18 18:00 -2d +1w>")
parse(OrgDoc, "<2020-01-18 18:00-20:00 -2d +1w>")
parse(OrgDoc, "<2020-01-18    18:00    -2d    +1w>")
parse(OrgDoc, "<2020-04-25>--<2020-04-28>")
parse(OrgDoc, "<2020-04-25 08:00>--<2020-04-28 16:00>")
parse(OrgDoc, "[2020-01-18 18:00-20:00 -2d +1w]")
parse(OrgDoc, "<2020-04-25 day wrong>")
parse(OrgDoc, "<2009-10-17 Sat .+2d/4d>")
parse(OrgDoc, "<2009-10-17 Sat 15:30:55>")
parse(OrgDoc, "<2009-10-17 Sat 8:00>")
parse(OrgDoc, "<2020-04-17 F\nri>")
parse(OrgDoc, "<2020-04-17\nFri>")
parse(OrgDoc, "08:00")
parse(OrgDoc, "8:00")
parse(OrgDoc, "08:00:00")
parse(OrgDoc, "8:00AM")
parse(OrgDoc, "08:00pm")
parse(OrgDoc, "[2021-05-22 Sat 23:26-23:46]")
parse(OrgDoc, "[2021-05-22 Sat 23:26]--[2021-05-22 Sat 23:46]")
parse(OrgDoc, "<2021-05-22 Sat 23:26-23:46>")
parse(OrgDoc, "[2021-05-22 Sat 23:26]")
parse(OrgDoc, ": ")
parse(OrgDoc, ":")
parse(OrgDoc, ":  literal text")
parse(OrgDoc, "  : literal text ")
parse(OrgDoc, ":literal text")
parse(OrgDoc, " : foo \n : bar")
parse(OrgDoc, "[[https://example.com]]")
parse(OrgDoc, "[[www.example.com]]")
parse(OrgDoc, "[[https://example.com][description words]]")
parse(OrgDoc, "[[id:abc-123]]")
parse(OrgDoc, "[[#my-custom-id]]")
parse(OrgDoc, "[[*My Header]]")
parse(OrgDoc, "[[A Name]]")
parse(OrgDoc, "[[id:]]")
parse(OrgDoc, "[[id:z]]")
parse(OrgDoc, "[[\\]]")
parse(OrgDoc, "[[\\\\]]")
parse(OrgDoc, "[[\\a]]")
parse(OrgDoc, "[[a[b]]")
parse(OrgDoc, "[[\\[]]")
parse(OrgDoc, "[[\\]]]")
parse(OrgDoc, "file:folder/file.txt")
parse(OrgDoc, "file:~/folder/file.txt")
parse(OrgDoc, "file:~/fol:der/fi:le.txt")
parse(OrgDoc, "./folder/file.txt")
parse(OrgDoc, "/folder/file.txt")
parse(OrgDoc, "./file.org::15")
parse(OrgDoc, "./file.org::foo bar")
parse(OrgDoc, "./file.org::foo::bar")
parse(OrgDoc, "./file.org::*header1: test")
parse(OrgDoc, "./file.org::#custom-id")
parse(OrgDoc, "www.example.com")
parse(OrgDoc, "https://example.com")
parse(OrgDoc, "mailto:[email protected]")
parse(OrgDoc, "zyx:rest-of uri ...")
parse(OrgDoc, "text before <2021-05-22 Sat 00:12> after")
parse(OrgDoc, "text before [[http://example.com]] after")
parse(OrgDoc, "*bold text*")
parse(OrgDoc, "/italic text/")
parse(OrgDoc, "_underlined text_")
parse(OrgDoc, "=verbatim /abc/ text=")
parse(OrgDoc, "~code *abc* text~")
parse(OrgDoc, "+strike-through text+")
parse(OrgDoc, "/italic/ italic/")
parse(OrgDoc, "=verbatim= text=")
parse(OrgDoc, "==")
parse(OrgDoc, "=verbatim =")
parse(OrgDoc, "= verbatim=")
parse(OrgDoc, "=verbatim = text=")
parse(OrgDoc, "===")
parse(OrgDoc, "=a=")
parse(OrgDoc, "<http://example.com/foo?bar=baz&baz=bar>")
parse(OrgDoc, "http://example.com/foo?bar=baz&baz=bar")
parse(OrgDoc, "abc ")
parse(OrgDoc, "\nfoo")
parse(OrgDoc, "a/b")
parse(OrgDoc, "a /b")
parse(OrgDoc, "*bold text*")
parse(OrgDoc, "*bold text* normal text")
parse(OrgDoc, "normal text *bold text*")
parse(OrgDoc, "normal text *bold text* more text")
parse(OrgDoc, "*bold text* text*")
parse(OrgDoc, "/italic / text/")
parse(OrgDoc, "normal text <http://example.com> more text")
parse(OrgDoc, "normal text http://example.com more text")
parse(OrgDoc, "normal text [[http://example.com]] more text")
parse(OrgDoc, "normal text [[http://example.com]][fn::reserved]")
parse(OrgDoc, "text _abc")
parse(OrgDoc, "text_abc")
parse(OrgDoc, "text^123")
parse(OrgDoc, "text^abc_{123}")
parse(OrgDoc, "abc \\\\  ")
parse(OrgDoc, "abc \\\\ xyz")
parse(OrgDoc, "text{{{my_macro5(0,'{abc}')}}}")
parse(OrgDoc, "text<<my target>>")
parse(OrgDoc, "text<<<my target>>>")
parse(OrgDoc, "text\\Alpha-followed")
parse(OrgDoc, "text \\Alpha{}followed")
parse(OrgDoc, "{{{my_macro5()}}}")
parse(OrgDoc, "{{{my_macro5(arg)}}}")
parse(OrgDoc, "{{{my_macro5(x\\,y, (0),'{abc}')}}}")
parse(OrgDoc, "\\Alpha")
parse(OrgDoc, "\\Alpha{}")
parse(OrgDoc, "<<t>>")
parse(OrgDoc, "<< t>>")
parse(OrgDoc, "<<t >>")
parse(OrgDoc, "<< >>")
parse(OrgDoc, "_abc")
parse(OrgDoc, "_123")
parse(OrgDoc, "_1a2b")
parse(OrgDoc, "_*")
parse(OrgDoc, "_.,\\a")
parse(OrgDoc, "_-.,\\a")
parse(OrgDoc, "_+.,\\a")
parse(OrgDoc, "_{.,-123abc!}")
parse(OrgDoc, "text_{{}")
parse(OrgDoc, "+---+\n| x |\n+---+\n")
parse(OrgDoc, " +---+\n | x |\n +---+\n")
parse(OrgDoc, " |--+--|\n | x|x |\n |--+--|\n")
parse(OrgDoc, " |--+--|\n | x|x |\n |--+--|\n #+TBLFM: $4=vmean($2..$3)")
parse(OrgDoc, "CLOCK: [2021-05-22 Sat 23:26]--[2021-05-22 Sat 23:46] =>  0:20")
parse(OrgDoc, " CLOCK: [2021-05-22 Sat 23:26-23:46] =>  0:20 ")
parse(OrgDoc, "CLOCK: [not a timestamp Sat 23:26] =>  0:20 ")
parse(OrgDoc, "%%(nr()<n)-h")
parse(OrgDoc, "  %%(x)")
parse(OrgDoc, "SCHEDULED: [2021-05-22 Sat 23:26]")
parse(OrgDoc, "  DEADLINE: <2021-05-22 Sat> ")
parse(OrgDoc, "SCHEDULED: [2021-05-22 Sat 23:26]  DEADLINE: <2021-05-22 Sat>  CLOSED: [2021-05-21 Fri] ")
#+end_example

schoettl avatar Apr 16 '22 08:04 schoettl

Thanks, no errors should be emitted, so I'll give this a look at some point.

tecosaur avatar Apr 19 '22 14:04 tecosaur

Down to two errors:

  • Regular links with unescaped/unpaired square brackets
  • Radio targets starting/ending with a space

tecosaur avatar Jan 28 '23 16:01 tecosaur

Here is one more error:

parse(OrgDoc, "<2020-01-00>")
parse(OrgDoc, "<2020-00-01>")

(same for inactive timestamps or timestamps with repeaters)

I think it would be best to fall back to parse such malformed timestamps as simple text and issue a warning.

schoettl avatar Aug 03 '23 13:08 schoettl