omd
omd copied to clipboard
TyXML
Is it possible to have an optional module depending on TyXML that have a function to transforms an Omd.t into a TyXML type (Html5_sigs.T.elt) ? It would facilitate the integration with projects that use ocsigen.
I'll be glad if there exists a TyXML backend for OMD. If someone wants to implement it and asks for my help, I will be happy to help! :-)
@jpdeplaix: if you don't mind a convoluted route, you could take a look at Lambdoc, which does support output to a Tyxml type and now also includes Markdown (via OMD) as one of the markups it supports.
@darioteixeira can you do at least an opam package for camlhighlight ?
@jpdeplaix: I already have a Camlhighlight OPAM package in my local repo (and a Lambdoc one, for that matter). The only reason I haven't pushed it upstream yet is because Camlhighlight currently requires a manual step before it's ready for use (you must copy a file into /usr/share/source-highlight
). I'm hoping to find a solution with the Source-highlight upstream that will make this step unnecessary.
There's another issue with Camlhighlight that I wish to see resolved before the next release: it actually outputs a Eliom_content.Html5.F.elt
, because at the time mixing Html5.F.elt
and Eliom_content.Html5.F.elt
was a PITA. Perhaps newer versions of Tyxml/Eliom make this easier: I'll give it a try...
@jpdeplaix: Mind you, there is also still a problem with the Markdown support in Lambdoc: because OMD does not provide location information, error messages cannot pinpoint the line number where an error occurred (see this issue).
@jpdeplaix @darioteixeira Curious if either of you have an update on this. If there's no clear solution still, I might consider implementing something.
@agarwal: In trunk, Lambdoc's support for Markdown (via OMD) is much better now, though there are still some issues to resolve before a 1.0 release (if you want to try this route note that Lambdoc's trunk depends on Tyxml's trunk). As for direct support for Tyxml in OMD, I don't know of any developments.
@darioteixeira Thanks. I'll take a look.
There is still work to be done, but I have a first "quick and dirty" solution to this issue (see above commit).
The more complex constructors in the Omd.element
type are not or badly documented. It would help me if someone could explain to me what are
- Paragraph and how to enable it in
of_string
; - Ulp, Olp (I read somewhere that paragraphs are implicit, but what does it really mean?)
- NL
- Ref
- Img_ref
- X
Besides implementing these correctly, I also need to provide the code inside a functor, so that any TyXML module can use it. I also plan to provide submodules D and F compatible with Eliom_content.Html.{D,F}.
Any comment on that would be welcome. Am I heading in the right direction?
Maybe @Drup can answer to some of these questions
I would be happy to answer tyxml questions, but I feel the implementation above does a decent job at trying to turn the loosely-typed omd AST into tyxml. The questions seems mostly omd-specific.
I've encountered a need for this as well, so thought I'd revive the old discussion.
@shepard8 are you still interested in pushing this through? If so, I'd be happy to see if I could help figure out how we should treat the listed constructors. (It not, I'd be willing to pick this issue up.)
Hello!
I have no time for this at the moment, feel free to continue (or restart from scratch, I don't know your plans :-) ) the work on this issue.
Ok! Thanks @shepard8. I should be able to take a crack at this in the next week or two.
There is a simple HTML intermediate representation now in master
:
https://github.com/ocaml/omd/blob/01614cc0227303224e3f39cb5e85d40a7774180d/src/html.mli#L4-L13
It should be trivial to transform to TyXML.
Would it be useful to use this intermediate representation to write the optional module that would actually satisfy this issue? Or is the particular feature request here essentially being rejected, on the grounds that it is easy enough for users to write their own transformers?
A priori I think this is something that users can write on their side if they need to. But if there is a lot of demand for it we can revisit the issue.
Translation from markdown to HTML is not as trivial as people think, so it seems useful to provide it somewhere. Personally, I would suggest providing a separate package omd-tyxml
, with an appropriate css that does just that, once and for all.
I tend to agree! I'd be happy to take on this work item, unless someone else is particularly keen on it.
@nojb Would you be up for having the package be part of this repo, or would you rather I implement it in a separate repository?
@nojb Would you be up for having the package be part of this repo, or would you rather I implement it in a separate repository?
I think it makes sense to have it in this repository, have a go if you feel like, just be aware that the code might still budge a bit before the release. Thanks!
Can this be reopened until such a subpackage is available?
@nojb Sounds good. I should have time this weekend. :)
I've started work on this, and opened a very rough WIP PR, just to explore some directions and get the lay of the land, in #211. As I've poked around, I've discovered that don't know what we gain in going through the intermediate HTML representation:
- From the perspective of implementation, using the intermediate representation means we have to use stringly typed nodes rather than nodes tagged with enums, which means we lose exhaustiveness checking on almost all of the structure.
- From a usage perspective, having to go through the intermediate representation means an extra round of tree transformations, which will surely impact performance negatively.
I'd like to get your take on this, @nojb, and see if I may be missing something that you had in mind which gives the intermediate representation an edge here.
I've started work on this, and opened a very rough WIP PR, just to explore some directions and get the lay of the land, in #211. As I've poked around, I've discovered that don't know what we gain in going through the intermediate HTML representation:
The HTML intermediate representation allows to nicely separates the HTML printing logic from that of generating the HTML. It is also used to share code with the plain text backend. Also, it allows to embed arbitrary content inside the generated HTML.
- From the perspective of implementation, using the intermediate representation means we have to use stringly typed nodes rather than nodes tagged with enums, which means we lose exhaustiveness checking on almost all of the structure.
For the purposes of TyXML, the intermediate representation does not have to be used. There are pros and cons to using it.
If you do use it, you share the logic with the usual HTML backend which makes it easier to be sure that you are generating the same HTML in both cases. But the "stringy" representation of nodes in the intermediate representation means that one does not get a static guarantee that the TyXML backend covers all the cases arising from Markdown.
On the other hand, if you don't use the intermediate representation you get a static guarantee that you are covering every case arising from Markdown, but you will need to replicate some of the Markdown -> HTML logic for that backend in order to produce the same HTML as the "usual" backend.
Ideally, we would like the HTML produced by both backends (the "usual" one and the TyXML one) to always coincide when the "usual" backend generates valid HTML (note that Markdown can embed arbitrary content in so-called HTML blocks; you will need to decide how to handle these, as they may not be valid HTML).
- From a usage perspective, having to go through the intermediate representation means an extra round of tree transformations, which will surely impact performance negatively.
I would be surprised if the performance impact was noticeable in practice.
What's the purpose of the non-tyxml HTML backend apart from "does not use tyxml" ?
What's the purpose of the non-tyxml HTML backend apart from "does not use tyxml" ?
Markdown can embed arbitrary content which is supposed to be quoted verbatim inside the generated HTML (see here). This would be tricky to represent using tyxml (as far as I understand).
This would be tricky to represent using tyxml (as far as I understand).
That's not particularly a problem no, it just mean using Html.Unsafe
to build those parts (since the well-formedness guarantee clearly doesn't apply).
That's not particularly a problem no, it just mean using
Html.Unsafe
to build those parts (since the well-formedness guarantee clearly doesn't apply).
Thanks, I didn't know about the Unsafe
module. I guess this means that the best would be to make a standalone tyxml backend (not dependent on the existing HTML representation). We could then evaluate it and (perhaps) decide to switch to it altogether.
After poking around a bit, I had the same thought as @Drup. If we are not opposed to the tyxml dependency in principle, I think it would appealing to replace the intermediate representation with a tyxml representation. This would also mean there's no need for a second optional package. I'd be happy to reorient in that direction.
After poking around a bit, I had the same thought as @Drup. If we are not opposed to the tyxml dependency in principle, I think it would appealing to replace the intermediate representation with a tyxml representation. This would also mean there's no need for a second optional package. I'd be happy to reorient in that direction.
I think this is a bit premature. Let's see an implementation of the backend first (as a separate package), let's see about testing, etc, and then let us consider whether we want to switch to the tyxml backend altogether.