umap
umap copied to clipboard
Dockerfile
Originally reported by: BitBucket: almereyda, GitHub: almereyda
As I would love to see uMap spreading across the world, let's start talking about a Dockerfile here.
- Which prequisites do we need?
- Which succsessful (Geo)Django Dockerfiles are known, what can we build upon?
- https://registry.hub.docker.com/search?q=django
- https://github.com/dockerfile?tab=repositories
It is to mention, that Docker setups prefer stateless apps, i.e. decoupled logic and persistence. http://12factor.net/ gives an introduction to that thought.
Once I've got that working, I'd like to provide a GeoCouch adapter for Django and even further seperate the UI from the backend, so it could run as a CouchApp.
- Bitbucket: https://bitbucket.org/yohanboniface/umap/issue/80
Original comment by Yohan Boniface (BitBucket: yohanboniface, GitHub: yohanboniface):
That would be a very nice goal to achieve! I'm not sure though uMap is mature enough for that: API is still changing quite often, and I'm not focusing on making it stable for now.
Also, the main challenge is to make data consistent over builds of docker image, BUT keeping the build simple (or we lose the interest of docker).
On backend side, the next big refactoring will be to allow concurrency edit (with diff management), I think it would be better to wait until this land before working on stable images.
Again: this is really a nice goal, but I would be very sad to spread images making people installing it, thinking everything is stable and will painlessly upgrade etc. if it's not.
Original comment by BitBucket: almereyda, GitHub: almereyda:
I'm even thinking, maybe dreaming, that it could also be an Unhosted webapp, with either remoteStorage or CouchDB or any other transparent HTTP RESTful Storage layer.
I'm most aware of the fact that this is unstable software : the version tag, only changing once in a while, is the prove for that. But uMap is, as far as I know, the most powerful mapping engine between geojson.io and CartoDB.
I can wait with publishing a Dockerfile, but anyhow I will build on a preconfigured PostGIS image for persistence and another image for the Django app. If you say the API is still too changing, and you building another Versioning/Operational Transformation layer, maybe my CouchDB 'choice' (Andrew Frank used to tell me to use it in 2012 ...) is not the worst.
Have you already read
- http://boundlessgeo.com/whitepaper/distributed-versioning-geospatial-data-part-2/
- http://boundlessgeo.com/whitepaper/distributed-versioning-geospatial-data-part-3/ esp. 8.,9. & 13.(!)
?
I'm also in contact with the GeoCouch developer, if you're interested one could get his opinions, he's very open. So if my explorations of the documentation deliver good results, I can build on that. Because I will neccessarily need it to document my own installation - which is already work for others. We had to choose it because nothing comparibly existed before...
Having said that, and respecting the maintainer's authority (by knowledge of the thing), I am very interested in your views and imaginations on the API and everything roadmap'ish. [ Greetings from Berlin. ]
Original comment by Yohan Boniface (BitBucket: yohanboniface, GitHub: yohanboniface):
I'm even thinking, maybe dreaming, that it could also be an Unhosted webapp, with either remoteStorage or CouchDB or any other transparent HTTP RESTful Storage layer.
I fully agree! remoteStorage is something I have in mind from the begining, and the recents changes making the backend more blind and the client managing data goes in this direction.
Now, if we want to support versionning and concurrency editing, it's another story.
So clearly a nice-to-have, but not a priority given the work and my time to spend on this.
I can wait with publishing a Dockerfile
Please go ahead! :) But let's not publish that as stable, I don't wan't to create frustration.
Have you already read [...]
No, and thank you for that. Sounds very interesting, I will read them quickly.
I am very interested in your views and imaginations on the API and everything roadmap'ish
Well, sorry for not making the roadmap clear. Two things I can do now:
- stop splitting issues between three repositories (and two providers): this one, https://github.com/yohanboniface/Leaflet.Storage/issues?sort=updated&state=open and https://github.com/yohanboniface/django-leaflet-storage/issues?sort=updated&state=open
- publish more issues instead of keeping them in my local todolist (easier to manage)
I was discussing with friends at OSM France to set up a Gitlab for all the repo we are working on (I'm not happy with github monoculture, and Bitbucket issue tracker is not that clever). So atm I'm waiting for this for having a more centralized and readable issue tracker.
About the roadmap, here are the next big steps I want to achieve:
- concurrency editing/versioning
- support of multypolygon/multipolyline (needs to switch off from leaflet.draw or rewrite it, but...)
- simple table editing of data layers
- good statics management
- editing touch support (need to switch off from leaflet.draw)
- plus dozens of smaller things here and there ;)
Anyway, I'm happy to see that you have interest in uMap project :)
Original comment by Yohan Boniface (BitBucket: yohanboniface, GitHub: yohanboniface):
I'm working on static management, and better versioning. Version 0.6.3 should be the first step of a new aera :)
Original comment by BitBucket: almereyda, GitHub: almereyda:
It seems bitbucket didn't save my comment in localStorage before the tab died so again:
Dockerizing the application would mean to render it into a stateless app, therefore clearly dissecting storage and logic. I see that the two GitHub repositories aim at modularizing the storage layer, but unfortunatly I'm not clever enough to see the points where I could interevene and introduce another. Right now I'm a little unsure where to dock. How is uMap talking to the database? Through (Geo)Django, right? So there already is an HTTP API for CRUD operations on the data that the client uses? Then it could be simulated with CouchDB; else it should be possible to tell Django to use CouchDB, but my research indicates that could have some limitations; also most resources are quite outdated.
If you could point me somewhere (in django-leaflet-storage?), I'd be very happy. Because I care more about the Geodata than Django's extra tables for users and the system itself, etc.
Are you planning to make use of Geogit or are you reinventing the wheel?
Definately I also see lot's of possible directions to evolve for uMap. Your first step is putting more attention to the data table, therefore winking at CartoDB. One could even think of somehow integrating an interface to things like MapProxy and TileStream.
Ah, and GitLab is approaching version seven already, so it should be considered as mature, I think. The last version four I saw was already very good.
Please allow me to drop some links that might be interesting regarding front- and backend separation. I will need them, I think:
- http://leok.me/2013/05/02/what-you-need-to-know-couchdb-django.html
- http://couchdbkit.org/docs/api/
- http://wiki.apache.org/couchdb/Getting_started_with_Python
- http://nicolaisi.github.io/couchquery/
- https://pypi.python.org/pypi/django-docfield-couchdb
- https://pypi.python.org/pypi/django_couchdb_utils/0.5
- http://vanderwijk.info/blog/online-offline-web-application-based-couchdb/
Note: I am considering CouchDB as the primary alternative target to PostGIS, as it has the most powerful Geo extension of any NoSQL databases that I know of that have a geographical API : Neo4j, MongoDB & arrangoDB,
Original comment by Yohan Boniface (BitBucket: yohanboniface, GitHub: yohanboniface):
If you could point me somewhere (in django-leaflet-storage?), I'd be very happy.
Geojsons are now stored on the filesystem, you can check that in models.py
Are you planning to make use of Geogit or are you reinventing the wheel?
Is that a partisan question I can only answer with A? ;) Not all the wheels fit all vehicles, amigo. Changes are managed client side (javascript frontend), so at this point my plan is to make the js also sending diff to the server, and receiving them. Goals are two: cut down the size of the data transfert frontend->backend when saving a map, and make possible to handle edits made by other on the map real time (by pulling diffs from the backend).
But this is a very big picture, and things may changes. But seen from here, I don't think I will introduce Java in the stack. So not very geogit open.
Your first step is putting more attention to the data table, therefore winking at CartoDB. One could even think of somehow integrating an interface to things like MapProxy and TileStream.
Humm, those are not really is the 1.0 roadmap (see my previous message for big picture), and I think there is yet enough work to do not to extend the perimeter.
What advantages do you see of adding CouchDB in the stack?
Original comment by BitBucket: almereyda, GitHub: almereyda:
Geojsons are now stored on the filesystem, you can check that in models.py
Yeah, I saw that. Now I understand it. So there are two storage engines, PostGIS and file system now? Is the database used at all for geographical features?
Changes are managed client side (javascript frontend), so at this point my plan is to make the js also sending diff to the server, and receiving them. Goals are two: cut down the size of the data transfert frontend->backend when saving a map, and make possible to handle edits made by other on the map real time (by pulling diffs from the backend).
That sounds promising. Are you more aiming in directions of JSONP or of WebSockets? Well, you might as well implement your own PubSub synchronization protocol.
But seen from here, I don't think I will introduce Java in the stack.
No. Yes. Of course not. I was just thinking if one can derive ideas from there, possibly leading into another implementation. But as you say, not all vehicles are the same. And you are already providing great work : almost alone!
What advantages do you see of adding CouchDB in the stack?
CartoDB, MapProxy and TileStream are another world we don't need to think of right now, sure.
What is the rationale behind this effort?
I have some reasons to integrate GeoCouch into uMap; or otherways round, make uMap become a feasable GeoCouch editor:
1. use case
We are building a distributed spatial ecosystem, that unifies the storage of geodata for several initiatives and provides synchronization to their data sources. I seem to be in charge of making that happen.
Therefore we need a schema less database, due to different metadata, and have an easily accessible provider for the raw data API.
Additionally, automatic and semi-automatic conflict resolution will have to be implemented at some level.
2. GeoCouch
For recapitulation : GeoCouch is a NoSQL document store with a spatial index and a RESTful API. It allows key-identified storage of semi-structured data and provides Map(Reduce) views on that. Static HTML applications can be served as so-called CouchApps.
Additionally it offers replication and with the release of CouchDB 2.0 sharding.
It uses a lot more storage for Features and Indexes, though.
What makes it interesting for investigation is its JavaScript syntax, if one likes to write from Browser to Node and Couch in the same time, and the native storage + retrieval of GeoJSON FeatureCollections.
3. versioning
By its design, all documents would already be versioned. It would be easy to revert malicious changes that happened either by hazard or consciously. The replication mechanism also entitles conflict resolution. Additionally, PouchDB offers offline replication of databases to the browser (Mobile!) and can by synched back with conflict resolution, too.
4. semantics
Within one of the initiatives ( http://14mmm.org ) there is the chance that we wouldn't only create new OpenStreetMap Tags but also expose additional metadata as JSON-LD, so it can become part of the Linked Data Cloud.
Therefore I wish to create a semantic WebGIS to be able to donate Linked GeoData to the public. I know that is a big effort, given the fact that until now huge European Researche Projects work in the field and conferences are rare, but for me that's the way to go.
Instead of implementing a real quad store for this application, CouchDB seems to be flexible enough to deliver.
5. uMap
With the extra GeoJSON storage right now, it should be fairly easy to integrate GeoCouch, either only by using the HTTP API or with some kind of Python/Django extension (Links above).
Original comment by Yohan Boniface (BitBucket: yohanboniface, GitHub: yohanboniface):
Is the database used at all for geographical features?
Close to no. django-leaflet-storage doesn't use it at all, but store the center of the map as a geographical point. Then uMap uses it for filtering out the maps for which users doesn't have changed the default center, in order to have a more interesting list of maps in the home page.
But that's not a central thing if at some point this appears to be a blocker.
Are you more aiming in directions of JSONP or of WebSockets?
No JSONP for sure. WebSockets certainly are a nice option. But honestly I don't know those details at this point :/
With the extra GeoJSON storage right now, it should be fairly easy to integrate GeoCouch, either only by using the HTTP API or with some kind of Python/Django extension (Links above).
I need more time to estimate the interest of CouchDB myself.
Two remarks though:
- I've made Leaflet.Storage and django-leaflet-storage two different projects precisely for being able to set up different kind of storage (AND having remoteStorage in mind); so one option at this point could be to create a new backend for Leaflet-Storage, let's say couch-leaflet-storage; this would make easier to investigate, without the need of dealing with existing data, and in case this appears as a nice option we could in the future thing about migrating uMap to this new backend, and then, and only then, think about data migration
- the CouchDB reflexion should be imho a separate discussion than the Dockerfile at this point, otherwise none of those will never see the light ;)
Original comment by BitBucket: almereyda, GitHub: almereyda:
Yes, sorry, I'm used to distributed conversations Discourse offers and it somewhat relates to Docker due to the seperation of state and logic of an application.
I will create a new Issue once I get to be working on both, the Dockerfile and geocouch-leaflet-storage.
Uff. There is much info in this thread, not only related to a dockerized umap.
Do I understand correctly, that main questions in creating a docker setup for umap would be, how to keep data integrity on software updates ?
Would a (github) wiki page be a better place to collect these requirements, which are part of the to developed dev docs ?
Where does data live currently ?
As far as I understand uMap until now, it's state lives mostly in the file system. Apparently, from the comments above, PostgreSQL is only used for basic Django capabilities and the GeoDjango part for the centre of maps.
Filesystem storage and the database can be nicely abstracted within a docker-compose
setup, for example.
Knowing CouchDB is soon to have an official Dockerfile and GeoCouch seems to work dockerized, too, we could think of realigning these discussions:
@thoka also once mentionned there would be a working, containerized uMap setup, but I cannot find it.
- Creating a working
docker-compose
example for uMap
How hard would be a migration path of Django as a mere API layer for a - then - Single Page JS Frontend?
I know now uMap encompasses django-leaflet-storage
and Leaflet.Storage
, but how can or should we generalize them for different environments? Talking to a number of initiatives associated with @TransforMap we know most of them use Leaflet already, always building new backends for their respective stack (Rust/Neo4j, PHP/MongoDB, JS/MongoDB, etc.)
If I understand it correctly, couch-leaflet-storage
could be a first step into such a Leaflet.Storage
ecosystem.
Just a quick heads-up that in case anyone is interested, I've created a Dokku setup for uMap: https://github.com/jezdez/umap-dokku
This is very impressive @jezdez . From my dockerisation experiments I remember there remained issues with some PostGIS development libraries not available in the main container during pip install
of certain dependencies which would need those to compile. I think this concerned compilation of some GDAL/OGR libraries for Python, which were in use back then. Did you run into any of this, after Dokku's git receive hook had been fired?
@almereyda Nope, didn't hit that, I'm running an uMap instance right now with that and so far it seems to work as intended.
There is now a PR here to add this to the main repo: #456
If you want to try it out, get the docker branch from my fork, install Docker for your platform and run make docker-up
.
@jezdez what a marvel I am looking for and I will test I will make a fork of your git and lift the docker to test. Thank's!
If you want to try it out, get the docker branch from my fork, install Docker for your platform and run
make docker-up
.
@LucianoSilvaSantos Hey, no problem, I deleted my fork last year, but just reforked and pushed the docker branch again to it. Let me know if you want to pick it up and finish #456. Cheers!
@LucianoSilvaSantos Ei, não há problema, eu deletei meu garfo no ano passado, mas apenas reforked e empurrei o ramo docker novamente para ele. Deixe-me saber se você quer pegá-lo e terminar # 456 . Felicidades!
Yes, I really want to, and I'm very grateful if you can help me. I have a project that I want to use to support a community here in Brazil. I am very grateful! Through Docker it's easy to get up and manage and replicate the environment.
Same here, possibility to have a docker image build and in registry ? That would make things so easy.
It finally landed! https://hub.docker.com/r/umap/umap
🎉 Many thanks for the long-term maintenance.
Thanks @almereyda :)