publicgoods-candidates icon indicating copy to clipboard operation
publicgoods-candidates copied to clipboard

Add DPG: UN Biodiversity Lab

Open dpgabot opened this issue 2 years ago • 7 comments

Automatic addition of a new digital public good submitted through the online form available at https://digitalpublicgoods.net/submission

dpgabot avatar Mar 18 '22 13:03 dpgabot

Checklist for conducting technical review against DPG Standard:

  • [x] Clear Ownership
  • [x] Platform Independence
  • [x] Documentation
  • [x] Mechanism For Extracting Data
  • [x] Do No Harm By Design
  • [x] Data Privacy & Security
  • [x] Adherence to Standards & Best Practices
  • [x] Adherence to Privacy and Applicable Laws
  • [x] Inappropriate & Illegal Content
  • [x] Protection From Harassment
  • [x] Development & deployment countries

nathanbaleeta avatar May 12 '22 07:05 nathanbaleeta

UNBL has mandatory dependencies that create more restrictions than the product itself, meaning this software depends on proprietary software and no alternative is provided. UNBL uses the following proprietary software components as illustrated in the system architecture diagram

  • AWS Lambda workers
  • Amazon Simple Notification Service
  • Amazon API Gateway
  • Google Earth Engine
  • Cloud Storage marapp-architecture-v3

Until UNBL is able to demonstrate independence from the closed component(s) or provide FUNCTIONAL, open alternatives that can be used without significant changes to the core product, this submission still fails the DPG standard #4 indicator

nathanbaleeta avatar May 24 '22 06:05 nathanbaleeta

The most recent release of UNBL has switched our AWS and Google services for Azure services. However, this code still needs to be cleaned before we release publicly (planned release: July/August 2022). This addresses many of the dependencies highlighted above. Changes include:

  • GEE Storage and Tilling have been replaced with STAC + any file store (we use Azure Storage). This was a very large and challenging step forward.
  • GEE metrics have been replaced with python/gdal/rasterio/numpy which are all open. However, these services are run on a “cloud scale” using Azure Batch, which is not.
  • GEE Tiler has been replaced with Titiler, which is open. However, this service is run on a “cloud scale” using Azure Batch, which is not.
  • AWS Lambda was replaced with Azure functions (both not open).
  • AWS SNS/SQS was replaced with Azure batch (both not open).
  • We also have a (fairly light) dependency on Mapbox which is not documented above.

We understand there are a couple of components referenced above which are cloud-platform specific. We knowingly created them to be very loosely coupled. Any event handler, such as a generic Kafka service, could provide the needed functionality. The cloud-providers make their own packaged versions of these real-time event handler services at a significant improvement in development time and operating costs.

  • Making a SaaS platform as large and feature rich as the UNBL platform completely cloud agnostic would be excessively expensive and less reliable, two things that are extremely important in this case.
  • We suggest that this is something that the DPGA should consider in the DPG standards, as this is a remarkably different case from a desktop-run software package. All of the key dependencies above could, theoretically, be replaced with something like a Kafka Event handler and custom consumer/producer functions. However, the approx 6-9 months' worth of new development time and could not guarantee the system reliability and operating cost the same way that a commercial cloud can.
  • We agree that it would be excellent to have this type of open source technology for the cloud but this is not something that is available at this point in time -- this is truly testing the bounds of the open source community.

Cc: @iperdomo

nathanbaleeta avatar Jun 17 '22 11:06 nathanbaleeta

  • GEE Storage and Tilling have been replaced with STAC + any file store (we use Azure Storage). This was a very large and challenging step forward.

STAC is an open specification. The fact that the files are stored in Azure Storage is not relevant in this case, those files are JSON blobs accessed via HTTP.

  • GEE metrics have been replaced with python/gdal/rasterio/numpy which are all open. However, these services are run on a “cloud scale” using Azure Batch, which is not.

Using open source components, hosted in a cloud provider is OK. The dependency is on python/gdal/rasterio/numpy not Azure Batch.

  • GEE Tiler has been replaced with Titiler, which is open. However, this service is run on a “cloud scale” using Azure Batch, which is not.

Titiler depends on FastAPI. The dependency is in an open source component.

  • AWS Lambda was replaced with Azure functions (both not open).
  • AWS SNS/SQS was replaced with Azure batch (both not open).

Azure Functions & Azure Batch are proprietary closed source dependencies. These dependencies make the project not platform independent.

  • We also have a (fairly light) dependency on Mapbox which is not documented above.

Mapbox requires an account (most of the time used with a free tier) but places more restrictions than the project license.

  • Making a SaaS platform as large and feature rich as the UNBL platform completely cloud agnostic would be excessively expensive and less reliable, two things that are extremely important in this case.-

I don't agree with this statement. You can use open source components and just use the cloud provider for orchestration and execution. Depending directly on the cloud provider offerings is the problem that creates more restrictions. We don't have access to the source code yet, so I can't validate my suggestion.

iperdomo avatar Jun 22 '22 16:06 iperdomo

How much does Earth Engine cost? Earth Engine is free for research, education, and nonprofit use. For commercial or operational applications, evaluation of Earth Engine is permitted. Pricing details are not yet available. -- https://earthengine.google.com/faq/

The dependency in Earth Engine is also problematic

iperdomo avatar Jun 22 '22 16:06 iperdomo

  • GEE Storage and Tilling have been replaced with STAC + any file store (we use Azure Storage). This was a very large and challenging step forward.

STAC is an open specification. The fact that the files are stored in Azure Storage is not relevant in this case, those files are JSON blobs accessed via HTTP.

  • GEE metrics have been replaced with python/gdal/rasterio/numpy which are all open. However, these services are run on a “cloud scale” using Azure Batch, which is not.

Using open source components, hosted in a cloud provider is OK. The dependency is on python/gdal/rasterio/numpy not Azure Batch.

  • GEE Tiler has been replaced with Titiler, which is open. However, this service is run on a “cloud scale” using Azure Batch, which is not.

Titiler depends on FastAPI. The dependency is in an open source component.

  • AWS Lambda was replaced with Azure functions (both not open).
  • AWS SNS/SQS was replaced with Azure batch (both not open).

Azure Functions & Azure Batch are proprietary closed source dependencies. These dependencies make the project not platform independent.

  • We also have a (fairly light) dependency on Mapbox which is not documented above.

Mapbox requires an account (most of the time used with a free tier) but places more restrictions than the project license.

  • Making a SaaS platform as large and feature rich as the UNBL platform completely cloud agnostic would be excessively expensive and less reliable, two things that are extremely important in this case.-

I don't agree with this statement. You can use open source components and just use the cloud provider for orchestration and execution. Depending directly on the cloud provider offerings is the problem that creates more restrictions. We don't have access to the source code yet, so I can't validate my suggestion.

@iperdomo please see source code files here.

nathanbaleeta avatar Jun 23 '22 11:06 nathanbaleeta

The shared code has a hard dependency in Auth0.

Required configuration: MongoDB Atlas Auth0

Auth0 supports OpenID Connect, an open standard authentication protocol. However the integration with Auth0 is not just at authentication, it also integrates in the authorization part.

  • https://github.com/natgeosociety/marapp-services/blob/7a4127868c8fdcfbba4cbcb79821f53a5dee32bc/src/services/auth0-authz.ts#L20
  • https://github.com/natgeosociety/auth0-authorization#auth0-authorization

Without providing an open alternative (e.g. Keycloak) the project is not platform independent.

@iperdomo would you be in position to attend the reviewer's checkin meeting this Tuesday to help finalize the review of UNBL. Thanks.

nathanbaleeta avatar Jul 11 '22 08:07 nathanbaleeta