MapKnitterExporter architecture discussion
We are exploring parallel tracks for cloud-based MapKnitter exporting, and one option is a JavaScript based process.
The base idea is to run the export process as a scalable web service, possibly "serverless" or REST, in Google Cloud and/or other cloud providers like Amazon AWS Lambda (primarily Google Cloud but compatible with others). Comments/suggestions/eurekas welcome! π
Importantly, either track would ideally present the same API so that we could compare their performance.
JavaScript track
In this track, more experimentally, we'd use Image Sequencer, possibly with the webgl-distort library.
The major challenges here, I'd guess, would be:
- handling very big image files (up to 8mb each?) in memory
- serious speed improvements in IS, such as the proposed WebAssembly or WebGL adapters
- figuring out the best way to persist images for later access, and how to integrate the exporter with this (passing a callback function to upload them to a given store? Credentials?)
- trying to duplicate or integrate GDAL's generation of a giant combined GeoTIFF (just really huge images to manage in memory?)
- trying to duplicate or integrate GDAL's generation of TMS-formatted map tiles
For these last two, see #296 where there are some JS options to experiment with.
Also, we would try to develop this track in such a way as to make it possible to run locally in the browser, natively or in an Electron-style local JS app.
Ruby/ImageMagick/GDAL track
A more traditional approach is being explored here: #258, where we take the exporting sections currently featured in MapKnitter, and duplicate them in a minimal Ruby container that can be run on-demand.
Spec
To guide the development of both tracks, we're imagining a basic common behavior of:
- receiving a collection of image URLs or data-URLs of images AND a scale (cm/px or final pixel size)
- outputting a combined JPG image at a given scale or pixel size
- advanced versions might cut tiles or output GeoTiffs (see challenges in JS version above)
Links and resources are being compiled here: https://github.com/publiclab/mapknitter/issues/296
What have I missed? @tech4GT @icarito would you mind adding any questions, clarifications?
Update: diagrams
I've put together a diagram of the current exporter workflow, which I hope is helpful. It's also largely ported into a standalone Ruby library in #341 -- soon to potentially be a Gem:

Image Sequencer should allow us to parallelize this, and improve its speed, as illustrated in this diagram:

@jywarren this looks really nice! Things immediately make a lot more sense!π
I've just spent some time deploying a learning project to Google Cloud Platform (App Engine) as a Docker container. I've got a better understanding now of what is required! Thanks!
Awesome. I have a more system-wide planning issue drafted and will post soon. But these container tests can start whenever you both are ready. Sebastian do you think getting the gdal and imagemagick containers will be pretty straightforward too?
Thanks again!
On Tue, Jan 22, 2019, 4:12 AM Sebastian Silva <[email protected] wrote:
I've just spent some time deploying a learning project to Google Cloud Platform (App Engine) as a Docker container. I've got a better understanding now of what is required! Thanks!
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/mapknitter/issues/298#issuecomment-456294003, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJxwWLCcBjSpGwbipwzLX6OgVhYoLks5vFrm-gaJpZM4aDgKo .
OK, i'd like to add in an overview of the export system step by step; i've left notes where we might make changes or improvements as well, and will link to lines of code where these things currently happen!
Also -- a couple ideas:
- Idea: produce separate GeoTiffs to skip the
- Idea: produce TMS tiles from any collection of images, given tile coordinates and image sources (with known corner coordinates)
Breaking down the export process
Separable steps
- collect set of image URLs and their corner coordinates
- for each image (could do this from existing Ruby code or in npm module):
- determine image pixel dimensions
- convert corner coordinates to pixel positions
- for each image (using existing Ruby/ImageMagick code or in remote Image Sequencer container):
- current code at https://github.com/publiclab/mapknitter/blob/main/app/models/warpable.rb#L153-L336
generate_perspectival_distort - perform image distortion from original dimensions to new pixel corner positions
- embed exif data for corner coordinates
- save and return URL for download
- (optional, could do later) produce GeoTiffs of each image
- (optional, could do later) produce TMS tileset of each image
- current code at https://github.com/publiclab/mapknitter/blob/main/app/models/warpable.rb#L153-L336
- given collection of warped images, calculate pixel positions of image collection relative to each other (Ruby code exists)
- (optional alternative) produce SVG or PDF containing images at relative positions (less memory use)
- currently code appears in https://github.com/publiclab/mapknitter/blob/main/app/models/map.rb#L231 in
run_export, distort_warpables, generate_composite_tiff, generate_tiles, generate_jpg - produce composite/merged image using this data
- save and return URL of combined image for download
- (optional) produce GeoTiff of combined image
- (optional) pass GeoTiff to GDAL for conversion into traditional TMS tileset
- Possible next steps:
- produce merged TMS of step 3 per-image TMS tiles instead of generating from step 4's giant GeoTiff
- produce single TMS from combination of per-image GeoTiffs from end of step 3
@SidharthBansal @tech4GT @icarito just so you see this additional note breaking down the export process. There are portions that could be accomplished with traditional ImageMagick/GDAL combo just breaking out the Ruby-controlled code in our codebase (see #296 but i'll copy in more here), but I am hoping we can accomplish a lot in stand-alone containers in a serverless or at least remote REST model.
Starting work now! @icarito Can you please share some of the resources you have been going through, that would be a big help for me :)
@jywarren @icarito I would be starting with a basic express configuration that takes an image url and a sequencer string and returns the final output, can we create a repository for this on publiclab? Or should I make this on my github??
Okay a couple of things here
- We should add a flag to the run config which allows us to disable the progress logs(it'll unnecessarily slow down the server otherwise)
- I have some ideas in mind to speed up the pixelManipulation API which will in turn speed up most modules
- Should we return the output as a data uri or us the imgur service like we originally planned? Or maybe we can have a parameter in the request which allows both options
/* Request Body */
{
'url': <String>, // URL if input image
'sequence': <String> // The sequence string which will be imported into sequencer,
'upload': <Boolean> // Denotes whether to return the data uri or to upload to imgur and return that
}
How does this sound @jywarren @icarito ??
How about https://github.com/publiclab/image-sequencer-app, which we created a while back?
Let's start with a dataurl, but we should also plan to have an abstract way to "put" the image somewhere.
Great!
On Thu, Feb 7, 2019 at 3:36 AM Varun Gupta [email protected] wrote:
Okay a couple of things here
- We should add a flag to the run config which allows us to disable the progress logs(it'll unnecessarily slow down the server otherwise)
- I have some ideas in mind to speed up the pixelManipulation API which will in turn speed up most modules
- Should we return the output as a data uri or us the imgur service like we originally planned? Or maybe we can have a parameter in the request which allows both options
/* Request Body */ {'url': String, // URL if input image'sequence': String // The sequence string which will be imported into sequencer,'upload': boolean // Denotes whether to return the data uri or to upload to imgur and return that }
How does this sound @jywarren https://github.com/jywarren @icarito https://github.com/icarito ??
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/mapknitter/issues/298#issuecomment-461329950, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJw5cIYwIVHGe5TuxzTkK1mvYRb0Xks5vK-WFgaJpZM4aDgKo .
@jywarren Ok pushing the most basic setup now!
@jywarren Can you please grant me push access to the repository :sweat_smile:
doing so now, thanks!!!
On Thu, Feb 7, 2019 at 2:46 PM Varun Gupta [email protected] wrote:
@jywarren https://github.com/jywarren Can you please grant me push access to the repository π
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/mapknitter/issues/298#issuecomment-461569688, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ6EgdE3JvF9S42CVQM3xjNkq2MuCks5vLIJcgaJpZM4aDgKo .
@jywarren One more thing, do you want me to get cracking on the optimizations for sequencer first or deploy the container first?
Let's get the container working first -- but we can also encourage people in IS to tackle some of the optimizations, and point at this container to show why it'll be important!
On Thu, Feb 7, 2019 at 3:22 PM Varun Gupta [email protected] wrote:
@jywarren https://github.com/jywarren One more thing, do you want me to get cracking on the optimizations for sequencer first or deploy the container first?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/mapknitter/issues/298#issuecomment-461581609, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ8hcFGGQRrf4c7ercVIGG0UVva9dks5vLIrbgaJpZM4aDgKo .
Okay I'll try to deploy the container with a very basic setup tomorrow, and then I'll raise an issue for the optimizations, maybe I can document some of my ideas over there too! Also on a different note I tried out the app locally and it works like a charm :v:
oh wow!!! very cool.
Check out the various "projects" several of which are optimization related: https://github.com/publiclab/image-sequencer/labels/project
On Thu, Feb 7, 2019 at 3:27 PM Varun Gupta [email protected] wrote:
Okay I'll try to deploy the container with a very basic setup tomorrow, and then I'll raise an issue for the optimizations, maybe I can document some of my ideas over there too! Also on a different note I tried out the app locally and it works like a charm βοΈ
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/mapknitter/issues/298#issuecomment-461583123, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJwekjjGujCRH36NHXMVawY1rYa5Rks5vLIv6gaJpZM4aDgKo .
One think I am concerned about though is, if we do switch to web assembly, what parts of the main code we would need to re-write or should we just switch to something like openCV entirely? I think we can start with making optimizations in javascript and then move towards web-assembly if that gets unmanageable, what do you think?
Yeah i am not sure about it. I think we can follow multiple paths to optimize and should probably discuss that in the IS repo. Switching several modules to openCV would be powerful and flexible. So would webAssembly of pixelManipulation.
On Thu, Feb 7, 2019 at 3:33 PM Varun Gupta [email protected] wrote:
One think I am concerned about though is, if we do switch to web assembly, what parts of the main code we would need to re-write or should we just switch to something like openCV entirely? I think we can start with making optimizations in javascript and then move towards web-assembly if that gets unmanageable, what do you think?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/mapknitter/issues/298#issuecomment-461584988, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ6gpOiK6JeljqGN9nA6zAp1HfQ67ks5vLI1xgaJpZM4aDgKo .
I think you are right, also please do have a look at the repository, I have pushed the basic file I wrote earlier today, will be extending this A LOT but I think this gives us a start.
Just a note that Google Cloud Engine has Standard Environment and Flexible Environment and Ruby seems to only be supported on Flexible Environment which is significantly more expensive: https://cloud.google.com/appengine/docs/standard/appengine-generation
Oh, but is-app will be pure node.js, so I guess thats not a problem!
Hi @tech4GT @icarito @sashadev-sky and others -- i just uploaded diagrams above in which i tried to very clearly articulate the current and planned export workflows. Please have a look!
However, we should think about, in both cases, what points we should try to report status in a status.json file which could be polled in JavaScript by MapKnitter users as their export runs, to be able to see what stage their work is in.
This, and other aspects such as the parallel running and the image pairing during compositing, make me think we really need to consider a new layer, a mapknitter-exporter-runner that could persist a bit longer, run in a container itself, but could persist a status.json file for the entire export run.
We could even think more broadly and develop it as an image-sequencer-runner which can handle complex branching image sequencer runs. @tech4GT maybe this is where the full express-based image-sequencer-app comes in, since the simpler individual steps seem to be possible using just cloud functions? Love to hear your thoughts on all this.
Haha awesome label @sashadev-sky -- i'll respond more completely later today i hope!
Just noting that @icarito has created a Dockerfile for the GDAL/ImageMagick container track: https://github.com/publiclab/mapknitter/pull/349
Hi @jywarren I was thinking about the is-runner and maybe we can base it on the nodejs clustered api? https://nodejs.org/api/cluster.html Also How do we want to divide up the work inside these processes exactly? I mean is it specified by the user or we want some kind of algorithm to decide?
Hi Varun!!! Interesting. I guess we could start by noting for each step what previous steps must be complete for it to run, and we could point each at prior step references. We might track their state, but also trigger a re-assessment of if all are complete, using an event listener? I think it might be worth writing this out step by step. Like:
- receive multiple image URLs and coordinates
- step one: run one process for each image, /concurrently/
- once all images are done with step 2, begin next step
- go through images one by one and add them /sequentially/ to the previous image (using coordinate offsets) to make a big combined image
- ...
at some of these steps, we would need to know a) what triggers the step to check if it can begin, and b) what conditions must be met for it to start, right?
On Fri, Mar 15, 2019 at 3:32 AM Varun Gupta [email protected] wrote:
Hi @jywarren https://github.com/jywarren I was thinking about the is-runner and maybe we can base it on the nodejs clustered api? https://nodejs.org/api/cluster.html Also How do we want to divide up the work inside these processes exactly? I mean is it specified by the user or we want some kind of algorithm to decide?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/mapknitter/issues/298#issuecomment-473186237, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ9in_wZVCtW-g90UqnbSp_tvNxFPks5vW0xNgaJpZM4aDgKo .
This makes sense Jeff, let me write up some code and see if it works, letβs build this into is-app.
That's great. You could even start with a sequence that doesn't yet do distortion, but something simpler that already works. Then we'll have the shell of the system in the right format and can focus on just getting the internal modules to work.
Thanks!
On Sat, Mar 16, 2019, 1:02 PM Varun Gupta [email protected] wrote:
This makes sense Jeff, let me write up some code and see if it works, letβs build this into is-app.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/publiclab/mapknitter/issues/298#issuecomment-473565411, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ1IwF6c5gnZ-8t5GpgFXQCb35mX_ks5vXSN3gaJpZM4aDgKo .
Yeah thatβs what I was thinking, we can plug in the distortion part later since there are a couple of options I need to explore there and I don't want to slow this down because of that!
@jywarren Is there any is-module which stitches the images together? Or do I need to write one?