jsonix icon indicating copy to clipboard operation
jsonix copied to clipboard

[advice] How to use with schema repos, without Java

Open cyrilchapon opened this issue 7 years ago β€’ 25 comments
trafficstars

Hey :)

I'm desperately looking for a gml-to-geojson Javascript converter, and I came here after googling hard.

I found jsonix, and some schemas here but I'm struggling..

  • Is Java required for gml to geojson ? I found that and there is a "Compile mappings using Java section"; but when using ogc-schemas repo, it doesn't seem to be that need. Is there any "prebuilt" stuff that one can use without installing a whole Java stack ?
  • Is the "geojson schema" somewhere ?

Thanks :)

cyrilchapon avatar Jun 12 '18 16:06 cyrilchapon

Hi,

TL;DR Jsonix is not GML-to-GeoJSON converter but you can use it to build such a converter.

So first things first:

  • Java is not required. The ogc-schemas project you're referring to ships ready-to-use Jsonix mappings for most OGC schemas, including most of the GML schemas. You can either use these mappings or you can compile your own mappings. In this case you'll need Java in the compile time.
    Jsonix does not need Java in the runtime. It's pure JavaScript.
    Yes, there is prebuild stuff you can get from npm repo.
  • No, no GeoJSON schema.

Ok, no I have to clear up some misconceptions. Jsonix + GML mappings does not convert GML into GeoJSON. It converts GML into JSON, but it's not GeoJSON. To get GeoJSON you have to take JSON (produced by Jsonix) and build GeoJSON with it. Arguably this might be easier that parsing GML into GeoJSON manually. But it is still certain effort.

So if you're looking for a ready-to-use converter - no, this is not it. If you're open to an option of building such a converter yourself - Jsonix might help.

Let me know if you're interested, I could help (from Jsonix side).

highsource avatar Jun 12 '18 19:06 highsource

Hey @highsource,

Thanks for such a detailled, pragmatic and helpful answer πŸ˜„

Yes, there is prebuild stuff you can get from npm repo

Great to hear. ogc-schemas is the place to start then

Jsonix + GML mappings does not convert GML into GeoJSON. It converts GML into JSON

Ok so, basically, it's "just" some xml parser, but it comes with performance allowed by schema-awareness ?

So if you're looking for a ready-to-use converter - no, this is not it.

Yeah.. I was πŸ˜„, but apparently, beyong OpenLayers (which is basically not usable through node.js), there is nothing to do the job out there..

To get GeoJSON you have to take JSON (produced by Jsonix) and build GeoJSON with it. Arguably this might be easier that parsing GML into GeoJSON manually. But it is still certain effort

If you're open to an option of building such a converter yourself - Jsonix might help.

Yes ok. Actually after hard-googling gml-to-geojson, and before coming to jsonix, I was considering building my own, quality gml-to-geojson parser. I might do that, and xml parsing is the first step, along with some gml checks, sanitization, and stuff like knowing wether it's a feature, a geometry, etc. I was considering using some simpler xml parser like xml-js, but definately the schema stuff of jsonix could help.

cyrilchapon avatar Jun 13 '18 06:06 cyrilchapon

Great to hear. ogc-schemas is the place to start then

In general, yes. However I could implement a GML-only project as well. ogc-schemas contains way too much.

Ok so, basically, it's "just" some xml parser, but it comes with performance allowed by schema-awareness ?

Generally, yes. You could use just any XML-JSON converter out there. However Jsonix has a few nice features like (strongly-structured, type-safe) which, in my opinion, make working with resulting JSON easier. Please see the following presentation:

https://de.slideshare.net/orless/jsonix-talking-to-ogc-web-services-in-json

It explains why these features are important. (Slides 20-28.) Basically, you want reliable structures on the output - an array where it's a repeatable element in XSD, correct types in JSON so that you don't have to convert them on your own etc. This will lead to less code in JSON-GeoJSON converter.

The downside is that Jsonix is pretty heavy. Not a problem for the server side but might be an issue on the client side.

If you deside to give this a try, I'll help you (in the frame of available resources).

highsource avatar Jun 13 '18 09:06 highsource

Hey @highsource :)

I'm going to give it a shot this week. I'll start with ogc-schemas for now, but I'd be glad to get a quickstart maybe if you can provide me something about it.

FYI we're hacking an old WFS server out here, crossing data from French Government. The lack of a decent gml-to-geoson stuff motivates us to produce a full-featured Open Source one πŸ˜„

The trick is we're processing this realtime in a serverless function, so at term we'll need, something lightweight (as geographic processing is already pretty heavy)

cyrilchapon avatar Jun 19 '18 06:06 cyrilchapon

Hi @cyrilchapon.

You see, I'm a bit in a limbo here at the moment. I have to stay home due to a chemo treatment and I'm a bit bored. Answering questions on StackOverflow does not entertain me that much anymore.

So how about making a joint project on this GML<->GeoJSON converter? You can host the project in your organisation. As long as it's open-source with a non-copyleft license but something like APL, BSD, MIT, you can hold the copyrights. So basically I'll just be a contributor.

However I do have some specific goals. I'd like to learn TypeScript so I'd suggest developing it in TypeScript, with targets of servers-side Node.js as well as it should be runnable in browsers.

What do you think?

You're saying you need something lightweight. I'm not sure if Jsonix will suit you. With large schemas Jsonix will have a heavy start (initialization of the context). Creating unmarshallers/marshallers is very lightweight, though. So to be honest I am not quite sure Jsonix will fit your scenario.

highsource avatar Jun 20 '18 08:06 highsource

You're saying you need something lightweight. I'm not sure if Jsonix will suit you. With large schemas Jsonix will have a heavy start (initialization of the context). Creating unmarshallers/marshallers is very lightweight, though. So to be honest I am not quite sure Jsonix will fit your scenario.

Well, the alternative for me is to shim OpenLayers for Node.js, which (among being hardcore webpack config) is not-that-lightweight-at-all 🀣

So how about making a joint project on this GML<->GeoJSON converter ?

Yes we surely can. I'm way less available than you ATM, but I can give this a try

You can host the project in your organisation. As long as it's open-source with a non-copyleft license but something like APL, BSD, MIT, you can hold the copyrights. So basically I'll just be a contributor.

I'm open to that. Though, I suck at hosting open-source projects, with issues, real collaborators and stuff, but I'd be glad to give it a shot, learn and get advices. So I can host that, yeah πŸ˜„

You're saying you need something lightweight. I'm not sure if Jsonix will suit you. With large schemas Jsonix will have a heavy start (initialization of the context). Creating unmarshallers/marshallers is very lightweight, though. So to be honest I am not quite sure Jsonix will fit your scenario.

I have to rephrase myself, I personaly need something fast and not too memory-consuming. Weight in itself is a topic, but not for myself personaly. I think, in the process, a more-modular-way to build / package / host / import jsonix schemas would be valuable for everyone

However I do have some specific goals. I'd like to learn TypeScript so I'd suggest developing it in TypeScript, with targets of servers-side Node.js as well as it should be runnable in browsers.

Okay.. Well I don't know crap about Typescript. Not a fond of it though, but I'd be also glad to give it a shot too. But from this perspective, I can't really do any project-bootstrapping (setting up the ES2017 stack, typescript and everything), since I have no clue where to start with webpack, types definitions, and stuff. I also have a small amount of time to dedicate to this. But if you set up the things, we can do that πŸ˜„

Cheers

cyrilchapon avatar Jun 20 '18 08:06 cyrilchapon

You're saying you need something lightweight. I'm not sure if Jsonix will suit you. With large schemas Jsonix will have a heavy start (initialization of the context)

I think a just-the-gml-part build would produce something small. The code seems minification and gzip friendly.

cyrilchapon avatar Jun 20 '18 08:06 cyrilchapon

@cyrilchapon By the way, have you seen this project:

https://github.com/derhuerst/parse-gml-polygon

highsource avatar Jun 20 '18 13:06 highsource

BTW do you need it bidirectional? I.e. GML to GeoJSON as well as GeoJSON to GML?

Extracting GML parser from OpenLayers is an option, but it's not really a very nice one. I know OL guys are looking at Jsonix for a few years already, sometimes hint to it when people ask about parsing different OGC XMLs. So would be not my choice either.

So how about making a joint project on this GML<->GeoJSON converter ?

Yes we surely can. I'm way less available than you ATM, but I can give this a try

OK. Please take a look at the project I've mentioned above. At the moment it seems more reasonable to contribute to that project instead of rolling our own.

You can host the project in your organisation. As long as it's open-source with a non-copyleft license but something like APL, BSD, MIT, you can hold the copyrights. So basically I'll just be a contributor.

I'm open to that. Though, I suck at hosting open-source projects, with issues, real collaborators and stuff, but I'd be glad to give it a shot, learn and get advices. So I can host that, yeah πŸ˜„

I did a lot of open-source projects, but none with too great momentum, so I won't say it's my strength as well.

My problem at the moment is that I have a severe illness (brain tumor) so it's a bit hard to say how long I'm going to live. This is why I'm trying to avoid assuming control of new open-source projects. But I'm glad to contribute.

Okay.. Well I don't know crap about Typescript. Not a fond of it though, but I'd be also glad to give it a shot too. But from this perspective, I can't really do any project-bootstrapping (setting up the ES2017 stack, typescript and everything), since I have no clue where to start with webpack, types definitions, and stuff. I also have a small amount of time to dedicate to this. But if you set up the things, we can do that πŸ˜„

OK, but if you don't mind TypeScript, I'd love to go that way. I don't have much experience either, so I'd like to use this opportunity to learn (need it for my day job - wenn I'm hopefully back two business in a few months). I'll find someone to help us setting up.

But before I start, please check https://github.com/derhuerst/parse-gml-polygon and let me know if that's maybe enough for your purposes.

highsource avatar Jun 20 '18 13:06 highsource

@highsource thanks for your answer.

I did watch https://github.com/derhuerst/parse-gml-polygon, it seems well-writen. Though, it's way under-featured, mostly because :

  1. One would need not only Polygon
  2. One would need not only Geometries but also FeatureMembers, and FeatureCollections wrapper
  3. One would need to reproject coordinates in the process
  4. One could be afraid of that h( stuff. People usually have their own xml-parsers of choice, and only-accepting a pre-parsed input seems restrictive IMHO

But as I said, it seems to be well writen, so we can contribute there

OK, but if you don't mind TypeScript, I'd love to go that way. I don't have much experience either, so I'd like to use this opportunity to learn

Yes same here, a good lesson from building something is always welcome.

I'll find someone to help us setting up.

Great πŸ˜„

My problem at the moment is that I have a severe illness (brain tumor) so it's a bit hard to say how long I'm going to live. This is why I'm trying to avoid assuming control of new open-source projects. But I'm glad to contribute.

Sad to hear 😞.. I'd be glad to host it with decent LICENSE

cyrilchapon avatar Jun 20 '18 14:06 cyrilchapon

Hey there! Some thoughts about using parse-gml-polygon:

One would need not only Polygon. One would need not only Geometries but also FeatureMembers, and FeatureCollections wrapper.

Indeed. I think having a wfs-gml-to-geojson library would be very useful! I'd help build such a thing.

Besides that, parse-gml-polygon doesn't parse all variants to encode a polygon in GML (see the todos here and here), but most of them.

One would need to reproject coordinates in the process.

You can pass a transformCoords fn into parse-gml-polygon.

One could be afraid of that h( stuff. People usually have their own xml-parsers of choice, and only-accepting a pre-parsed input seems restrictive IMHO

I disagree. Having an input format based on JS objects and arrays actually makes the lib more flexible.

Most XML parsers, especially in JavaScript, are a nightmare: unportable, unnecessarily complex, work synchronously. You can either use the xml parser compatible with parse-gml-polygon or write an adapter.

Also, keep in mind that this h( stuff is only a shorter notation for JS objects.

But as I said, it seems to be well writen, so we can contribute there

You're more than welcome!

derhuerst avatar Jun 20 '18 16:06 derhuerst

Cool that @derhuerst joined us here.

Yes I think we were talking about something like wfs-gml-to-geojson. I think a robust and well-written library of this kind would be very useful.

@derhuerst What are your thoughts on TypeScript? I'd like to learn it (by using in a real-world project) but if that's a blocker for you guys, I'm fine with vanilla JS. I only think it's important to serve both node.js as well as browser worlds.

Parsing all variants of encoding a polygon in GML is, indeed, quite hard. I did it once for Java, there are so many options, it is incerdible. It also applies to other geometry types with somewhat lesser extent.

There is also another problem here - different GML versions. There are at least 6 different GML versions in active usage (1.0.0, 2.1.2, 3.1.1 - probably the most popular, 3.2.0, 3.2.1, 3.3). They have slight differences, sometimes backwards compatible, sometimes not (updated namespaces). 1.0.0 can probably be dropped, 3.2.0 probably too. But 2.1.2, 3.1.1, 3.2.1, 3.3 still leaves us with 4 different GML versions.

I agree that having JSON-like structures as input makes conversion easier. I think the original idea of the issue was to use Jsonix + GML mappings to parse XML. However, I'd guess you'd put Jsonix into the "nightmare" category due to the complexity. I actually do not insist on using Jsonix, it might really be too much heavyweight - even for the unique features it delivers. I personally am absolutely fine using any other parser.

@derhuerst I have one question though. Do you target browsers or only node.js? Would it be a problem transpilling parse-gml-polygon for browsers/vanilla JS?

highsource avatar Jun 20 '18 17:06 highsource

What are your thoughts on TypeScript?

IMO TypeScript can be valuable to catch a range of bugs caused by subtly broken and/or unintuitive behaviour of JavaScript. But I think it's only worth it if it doesn't result in bloated or unidiomatic code. The code shouldn't end up looking like Java/C++ code.

I only think it's important to serve both node.js as well as browser worlds.

Definitely! This is what makes well-written JS libs truly useful.

Do you target browsers or only node.js? Would it be a problem transpilling parse-gml-polygon for browsers/vanilla JS?

With exceptions like location, all libraries that I write a) work in both browsers & Node and b) are published to npm.

By "browser support" I usually mean "works if used with a bundler such as browserify or webpack".

But 2.1.2, 3.1.1, 3.2.1, 3.3 still leaves us with 4 different GML versions.

Yes, this will cause a lot of work. Still, properly separated code will keep the complexity reasonable.

However, I'd guess you'd put Jsonix into the "nightmare" category due to the complexity.

Not trying to offend anyone here. :P Most of the nightmare-ish aspects of JavaScript XML parsers come from XML. Still, many JS XML parsers do too much (e.g. use DOM APIs) and parse all at once (which is a no-go in JS).

derhuerst avatar Jun 20 '18 18:06 derhuerst

@derhuerst I'd like to give it a try with simpler geometries first - like point or line string. Should I fork/send PRs for parse-gml-polygon? Or do you want to create parse-wfs-gml-geometries project first? Or should I do it? (I don't want to host the project in my repos for the reason given earlier.)

highsource avatar Jun 21 '18 07:06 highsource

@derhuerst @highsource

You can pass a transformCoords fn into parse-gml-polygon.

Fine πŸ˜„

Besides that, parse-gml-polygon doesn't parse all variants to encode a polygon in GML (see the todos here and here), but most of them.

I'd like to give it a try with simpler geometries first - like point or line string.

I basically agree with that. Our repo, be it parse-gml-polygon or another, should allow to PR more and more geometries over time.

I only think it's important to serve both node.js as well as browser worlds.

Still, many JS XML parsers do too much (e.g. use DOM APIs) and parse all at once (which is a no-go in JS).

I agree so much... I saw many xml parsers out there, and basically for now I'm using Open Layers over Node.js this is a god damn pain as I'm to mock / shim every single browser API used globally with webpack. This is basically unbearable. I gave like 4 no-gos to various xml parsers because of relying on the DOM, and was barely to giving a no-go to jsonix for mentionning Java for building templates πŸ˜…without any offense πŸ˜„

THOUGH

Most XML parsers, especially in JavaScript, are a nightmare: unportable, unnecessarily complex, work synchronously. You can either use the xml parser compatible with parse-gml-polygon or write an adapter.

I found that jsdom works pretty well, and would allow to use a DOMish API. This seems like a bad idea at first glance, but looking more deeply at it, this approach could allow to write an universal library that could be blazzing fast in the recent browser... If you see what I mean, (and I also target browsers and node.js universaly), it's not a so bad idea to actually use the DOMish stuff for xml manipulations, in the browser. That doesn't mean it has to be the same in node.js, and that this would have to lead our decisions, but in the case of 2 different approaches, it would have to be 2 separate implementations of the XML parsing part (leading to less performance, because of the decoupling). Any thoughts on this ?

  1. Decouple xml parsing and gml coordinates understanding : improve XML parsing performance (read from DOM in browser, and from whatever-lib in Node.js)
  2. Couple it using a Node.js friendly API (xml-parser, or something pureJS not related with DOM) : improve the whole thing for Node.js, but probably dramatically kill performance on browser
  3. Couple it using a Browser friendly API, using DOM, and shim it from Node.js (using xmlserializer and jsdom) : Improve browser performance, but cut Node.js performance
  4. Find something very creative and flexible, that uses DOM in browser and something else in Node.JS, in the core of the module, that wouldn't kill performance in either way

cyrilchapon avatar Jun 21 '18 08:06 cyrilchapon

@cyrilchapon @derhuerst

I would like to suggest to stick with what parse-gml-polygon uses at the moment as it apparently works. We can engage in a "best way to parse XML in JS" later - if there will be a reason for this.

I think jumping on parse-gml-polygon and writing converters for other geometry types and filling holes where some encoding options are not supported yet is probably the fastest way to move forward.

I only want to wait for @derhuerst to deside if parse-gml-geometries should be a new project or will parse-gml-polygon transformed into parse-gml-geometries. As for WFS I think this should be a separate project using parse-gml-geometries. There are a lot of GML use cases without WFS so better keep em separated.

highsource avatar Jun 21 '18 09:06 highsource

I found that jsdom works pretty well, and would allow to use a DOMish API. This seems like a bad idea at first glance, but looking more deeply at it, this approach could allow to write an universal library that could be blazzing fast in the recent browser...

I know this is a general statement, but usually both real as well as mocked DOM APIs are pretty slow compared to "virtual trees" that don't have to comply with the DOM mechanics.

Besides that, DOM APIs are complex! Why chose something way more complex that might be faster? Sounds like premature optimisation.

I would like to suggest to stick with what parse-gml-polygon uses at the moment as it apparently works. We can engage in a "best way to parse XML in JS" later - if there will be a reason for this.

:+1:


I only want to wait for @derhuerst to deside if parse-gml-geometries should be a new project or will parse-gml-polygon transformed into parse-gml-geometries. As for WFS I think this should be a separate project using parse-gml-geometries. There are a lot of GML use cases without WFS so better keep em separated.

Good point. I created wfs-gml-to-geojson already, but I agree that there should actually be two libs. But we can still split them.

As a way to go forward: PR all GML features you want covered against wfs-gml-to-geojson, except polygons. Also just PRing test cases is a valuable contribution (have a look at the parse-gml-polygon tests)! I will add a readme and example code later on and publish it.

derhuerst avatar Jun 21 '18 10:06 derhuerst

Regarding code structure:

Try to keep the parsing code as simple and straightforward as possible. Just like in parse-gml-polygon: functions of the signature (el, transformCoords, stride). No classes necessary here.

derhuerst avatar Jun 21 '18 10:06 derhuerst

@derhuerst Great, I'll start today or tomorrow.

highsource avatar Jun 21 '18 10:06 highsource

Besides that, DOM APIs are complex! Why chose something way more complex that might be faster Sounds like premature optimisation.

Good point πŸ‘

As a way to go forward: PR all GML features you want covered against wfs-gml-to-geojson, except polygons. Also just PRing test cases is a valuable contribution (have a look at the parse-gml-polygon tests)!

I might not have that much time, but I'll surely be able to PR some failing tests to start with πŸ˜„

cyrilchapon avatar Jun 21 '18 14:06 cyrilchapon

What we quite urgently need is a lot of GML samples.

highsource avatar Jun 21 '18 20:06 highsource

Just found gml-to-geojson. It should work because it uses OpenLayers underneath, but it's certainly a hack because it hooks into the huge OpenLayers code base, stringifies to JSON and parses again.

I guess it's up to you @cyrilchapon @highsource. Do you want a proper tool that works by streaming data, on a virtual tree format for greater flexibility, with a minimal code footprint? Let's write the bespoke lib then. Or do you just want to get your work done? Use gml-to-geojson.

derhuerst avatar Jun 22 '18 20:06 derhuerst

@cyrilchapon This gml-to-geojson is your project, isn't it?

I personally would invest in wfs-gml-to-geojson. I have capabilities and would like to collaborate on a reasonable non-Java project.

highsource avatar Jun 22 '18 20:06 highsource

This gml-to-geojson is your project, isn't it?

Oh, haven't seen that! @cyrilchapon adapted it to the current state after they had asked this question here.

derhuerst avatar Jun 22 '18 21:06 derhuerst

Yeah this is mine 😁

I made that because we needed this feature very, very fast. It's kind of a temporary workaround, you can check the hell it takes to import OL inside the Webpack config file and the src/browser-shim stuff. The buggy lib added 5MBytes to my 4MBytes server less function, and it takes ages to load / parse.. but it works actually.

I'd be glad to contribute to our project, and then replace that crappy lib with that 😍

cyrilchapon avatar Jun 23 '18 22:06 cyrilchapon