poml icon indicating copy to clipboard operation
poml copied to clipboard

Lacks XML schema, README misidentifies markup as "HTML-like"

Open Diablo-D3 opened this issue 4 months ago • 5 comments

This project does not seem to have an XML schema published, nor do the examples have a DOCTYPE or xmlns to identify which schema is to be used to validate them.

The README also misstates what this is: This is not an "HTML-like" markup language, ie, this is not SGML, as all the examples contain no features indicative of SGML (ex: all tags have close tags, there are no self-closing tags, there are no null end tags, etc). This is clearly XML, and should only be referred to as such.

All the relevant tooling that Microsoft uses internally and externally has extremely good first-party XML support, and you should be leaning on it, not fighting it.

Diablo-D3 avatar Aug 10 '25 12:08 Diablo-D3

POML is not an XML. It only resembles XML, or being "XML/HTML-like".

I don't think it's reasonable to have a formal DOCTYPE for POML. It will be very inconvenient for users (especially those without XML experience) to master such language. Also, there are features that are important to POML (such as template engine) that are not well supported by traditional XML language. POML supports a pure-text mode, which allows to use POML even without the top-level <poml> tag. I forgot whether it's documented or not, but my hope is that users can try POML is minimal syntaxes needed to learn.

POML is implemented with an XML parser for historical reason. And it's causing me many troubles, hindering many flexibilities such as special character escaping, template without double quotes, multiple root poml support. I have a plan to migrate the parser entirely to a customized lexer and cst parser, and I'm currently working on it (with slow progress though as I'm busy on other projects).

https://github.com/microsoft/poml/blob/993179e4943e9d7691903e35c0a19f323cbb3dff/docs/proposals/poml_extended.md

https://github.com/microsoft/poml/tree/756f9f859ca5f3c1135fff5ebe0011ffbdd756f2/packages/poml/reader

All the relevant tooling that Microsoft uses internally and externally has extremely good first-party XML support, and you should be leaning on it, not fighting it.

I don't know where this comment comes from. From the perspective of prompt engineering, using XML to write prompts is currently a very very minor stream. I'm doing the attempt of bringing the idea of XML-like language into prompt engineering. I'm not trying to destroy anything about XML.

ultmaster avatar Aug 10 '25 12:08 ultmaster

You claim that you want this to be easy for under-experienced users, however many users use editors like Visual Studio, Atom and Visual Studio Code, Sublime Text, and others with similar feature sets. All of these can list diagnostics in the editor to tell new users that their markup is incorrect, and do this automatically as long as you do the right thing and provide a schema and give examples to users that are well-formed.

It doesn't matter if you choose SGML, XML, JSON, YAML, or TOML, all of them can have schemas, and you should be supplying one if you want this to be easy for users to adopt.

Diablo-D3 avatar Aug 10 '25 12:08 Diablo-D3

I have my own built Code IntelliSense such as diagnostics, hover, auto completion in VSCode. I promise it's no worse than VSCode builtin support for XML, JSON or YAML, if you are willing to try it.

If you are talking about LLM-based intellisense such as Copilot, I bet they perform worse on a new language. But I believe the gap can be somewhat amended by custom copilot-instructions, because the syntax is not far from a mixture of XML, mustache, jinja, angular and markdown.

ultmaster avatar Aug 10 '25 12:08 ultmaster

The idea of POML is great, and thank ultmaster for your thought process and contribution. Thank you. I have been learning and experimenting, and there are endless possibilities for this.

pramodhbn avatar Aug 17 '25 13:08 pramodhbn

I agree with diablo in this case. You built an xml-like format. Might as well lean into xml infrastructure instead of building your own extension, parser, and reinventing every vulnerability under the sun. It's not even hard to generate code out of xml schemas to bind them to objects.

Dragas avatar Aug 19 '25 19:08 Dragas