radiocells-scanner-android
radiocells-scanner-android copied to clipboard
Re-add download of full cell/wifi catalog
What steps will reproduce the problem? Go to the web site, navigate to Downloads > Cell and WiFi data and choose Worldwide dataset.
What is the expected output? What do you see instead? I'd expect the full cell/wifi catalog as it used to be. Instead, I get a 404 Not Found.
What version are you using? On what operating system? Does not apply.
Please provide any additional information below. Similar issues arise when trying to download the wifi catalog from Radiobeacon or the NLP backend – only individual countries are selectable but there's no way to download the full set.
I understand that this change was probably introduced because of bandwidth issues caused by many users downloading the whole thing over and over again, but limiting downloads to one country comes with other side effects. For example, I need to download multiple catalogs and switch manually whenever I cross a border.
I see three possible approaches to tackle this:
1. Simply restore the option to download the full catalog What probably contributed to the bandwidth issues was that originally one needed to download the whole catalog or nothing at all, because nothing else was available. Now that users can limit their downloads to particular areas of interest, this may be less of an issue.
2. Host the catalogs on a public cloud service Maybe Google Drive, maybe even Github, if their terms of service permit that. Anything that permits sharing (somewhat) arbitrary files over 1 GB in size and allows downloads by anonymous users would work.
3. Implement differential updates Abandon the idea of the catalog being a file of which clients periodically download a new version. Rather, start with a blank database on the client into which the client loads the data it needs. (This can still be in the form of .sqlite databases, of course.) Downloading the whole set of data is only necessary the first time. After that, clients/users just fetch those records which were added or changed since the last update. And since the catalog already has a "last updated" timestamp column, clients have all the information they need to merge two databases – it would just need to be implemented.
On the server side, this could be implemented in the following way:
- The full catalog (as well as any per-country catalogs) remains available as it is.
- Additionally, diffs are offered – for the whole world and optionally for individual countries. Structure could be something like:
- Yearly diffs since the start of the project, until the end of last year
- Monthly diffs for the current year (all that's not in the last yearly diff), until the end of last month
- Weekly diffs for the current month (all that's not in the last monthly diff), until the end of last week
- Daily diffs for the current week
- Each diff is a .sqlite file, just like a full catalog – the only difference being the records it contains.
- Timestamps for a catalog entry must increase with each update.
- To make generating diffs easier, timestamps should reflect the time the entry was added to/updated in the central catalog. If I've scanned a few wifis on Monday but don't upload them until Wednesday, they'll have Wednesday's timestamp. If this is satisfied, diffs can be generated from the full catalog any time.
- Diffs could either be pre-generated and stored on the server, or generated on the fly. That's basically a trade-off between storage vs. processing power on the server.
With client-side support, databases (full and diffs) could then be imported in any order. It comes with some coding but is by far the most elegant solution – and the most efficient one in terms of bandwidth.
Hey Michael,
thanks for reporting.. Quickfix: 404 was due to an typo only, wordwide data has been re-enabled.
Host the catalogs on a public cloud service
Google Drive cancelled our free plan and Github is limited to 25MB per file :-( Nevertheless bandwith is currently not an issue, so we might continue to self-host for a while
Implement differential updates
Database design is currently under investigation: mapsforge recently introduced a blazing fast, spatialite enabled POI database extension (https://github.com/mapsforge/mapsforge/blob/master/docs/POI.md). Experiments with using mapsforge spatialite format in Radiobeacon look very promising. Using their format we possibly might solve the performance issues on long tracks (#92)
Thanks, looks like the DB is up again. There used to be a JSON with version information at http://radiocells.org/default/database_version.json – where did that end up?
On 'country' level at https://radiocells.org/downloads/catalog_downloads.json.. Maybe I can somehow hack 'global' version info into that list too..
Ah... my use case was to check if a new version of the catalog is available as part of a shell script, which I do via a simple diff (if the remote JSON differs from the last one I got, it means there are changes). For this I could simply use the new JSON, since changes to the full catalog would mean that at least one country extract got an update. So in fact I can work around the missing "old" JSON.
Though, on the other hand, being able to download the full catalog from the backend would be useful...