XML parsing
I have a need to do XML parsing, both SAX (streaming tags) and structured tree data. Basing this on Foundation would be ideal. Does anyone have any plans in that area? My guess is unlike prettyprinting/parsing it shouldn't be in foundation itself.
I have thought about this, and for obvious reason I think this is, not only a good idea, but something that is important to have. Now in term of execution plan, that's a bit more fuzzy. Specially that's not the most straightforward thing to do (compared to say, json) and there's lots of annoying corner case.
As to the location of this, I'ld be happy to see this in foundation directly, as xml is a usual set of tools required to do one's job. (just like json and csv). I don't think that does "bloat" our code by that much. but obviously if anyone want to build this outside, that's fine too.
That all sounds ideal! JSON and CSV are also on my list :)
I can volunteer some experience on the XML library front if it would be useful.
On Sun, Sep 25, 2016, 8:30 PM Neil Mitchell [email protected] wrote:
That all sounds ideal! JSON and CSV are also on my list :)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haskell-foundation/foundation/issues/131#issuecomment-249434465, or mute the thread https://github.com/notifications/unsubscribe-auth/AADBB0xlGhXstXT2ljd6D-Lf9Nuiy3Hiks5qtq--gaJpZM4KE7gK .
@snoyberg I'd be interested, even if it doesn't impact foundation directly.
Cool, I'll write up some thoughts tomorrow.
On Sun, Sep 25, 2016, 8:33 PM Neil Mitchell [email protected] wrote:
@snoyberg https://github.com/snoyberg I'd be interested, even if it doesn't impact foundation directly.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/haskell-foundation/foundation/issues/131#issuecomment-249434608, or mute the thread https://github.com/notifications/unsubscribe-auth/AADBB5mGhV_eVTfsAHn-0mnT_9Io4NcVks5qtrBWgaJpZM4KE7gK .
FWIW the Scala community moved the XML parser from the standard library to its own package two years ago.
Also want to add that it's probably related or dependent on #119
While some improvements could be made, I think that the xml-types package provides a very good basis for streaming XML content. I'd recommend basing off of the Event type heavily, and then consider if some changes are worthwhile (e.g., modify how attributes are represented).
Once that's established: yes, some thought to streaming ala #119 needs to be taken into account. The most low-level approach I can think of, which would be compatible with all higher level layers, would look something like:
parseXML :: XMLSettings -> XMLParser
data XMLParser
= YieldEvent Event FilePosition XMLParser
| AwaitString (String -> XMLParser)
| ParseException XMLParseException
An alternative based on a list-t style interface would be something like:
parseXML :: MonadThrow m => ListT m String -> ListT m (Event, FilePosition)
Or a coroutine/conduit style approach:
parseXML :: MonadThrow m => ConduitM String (Event, FilePosition) m ()
If no resolution comes first to #119, I'd recommend starting with the first approach, and then layering on higher-level approaches once #119 is resolved.
@snoyberg - do you have any concrete advice for which XML parsing library to use until the foundation one is ready? xml-conduit? (I had initially thought it was probably a wrapper over another library, but it seems to be a separate XML parsing library entirely). Has anyone benchmarked the libraries? I've got to the state where XML parsing is a massive bottleneck.
I do in general recommend xml-conduit, but I can't say that I've ever done performance work with it. Back in the day I was using it as the basis of an application which was parsing some very large documents and never had any issue, but I don't know how it compares to best in class parses from other languages.
On Tue, Oct 11, 2016, 12:19 AM Neil Mitchell [email protected] wrote:
@snoyberg https://github.com/snoyberg - do you have any concrete advice for which XML parsing library to use until the foundation one is ready? xml-conduit? (I had initially thought it was probably a wrapper over another library, but it seems to be a separate XML parsing library entirely). Has anyone benchmarked the libraries? I've got to the state where XML parsing is a massive bottleneck.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/haskell-foundation/foundation/issues/131#issuecomment-252749253, or mute the thread https://github.com/notifications/unsubscribe-auth/AADBBz_3oDp3RNAD3xo3aQsleGzDHyGwks5qyqv6gaJpZM4KE7gK .
To add to my original list, YAML should probably be there. Note that the YAML library takes in ByteString, and spits out Vector, HashMap and Text - so you're hopping through multiple different packages for some fairly basic functionality. The fact that you have toList/unpack in different places doesn't help.