cloud-carbon-footprint
cloud-carbon-footprint copied to clipboard
RFC: Persisting estimation cache in PostgreSQL database for Backstage users
Feature Suggestion
Backstage is currently the only integration listed in the Integrations section of the Cloud Carbon Footprint website. The supported database for Backstage is PostgreSQL.
Is it feasible to consider a solution where Cloud Carbon Footprint can leverage PostgreSQL for caching estimation data? This would remove the need to support two database instances when integrating the plugin in a Backstage deployment. Describe the solution you'd like Cloud Carbon Footprint to have an option to efficiently store the estimation cache data in a PostgreSQL database. Additional context I understand that due to the nature of the estimation data that the above scenario may be inefficient. Would love to get some comment
@oconnormj Thanks for opening this issue. I think you bring up a very interesting and valid issue when running CCF as a Backstage Plugin, and how unideal it would be to manage two separate database services.
For additional context, MongoDB was selected due its ease of integration, how querying can be done with native javascript without introducing extra SQL, flexible nature in installing local or cloud instances under the same service, and due to the NoSQL JSON format requiring less transformation of the aggregated JSON structure of estimates thus allowing us to store and query estimates with less intervention.
However, flexibility is the goal with CCF, and I personally think it would be worth providing the option for setting up a PostgreSQL database instant to persist estimates. Having the Backstage plugin maintaining parity with the Backstage plugin would also be ideal. We would just need to spike out what this would look like within the code (i.e. https://node-postgres.com/) as well as setting up within the Backstage plugin as a new database to an existing instance.
@ccasher or @mvaltas would be good to know your thoughts on this
Switching out MongoDB for PostgreSQL is certainly not a simple task. PostgreSQL does offer support for JSON data types (https://www.postgresql.org/docs/16/datatype-json.html), but I'm not familiar enough with them to determine if they can replace MongoDB's aggregation capabilities. That said, I don't believe PostgreSQL, as a relational database, would work as a suitable replacement due to the points Arik mentioned. We deal with flexible data which might not easily conform to a relational schema.
@mvaltas To clarify, it was my understanding that this would be an added option as an alternative to MongoDB, rather than a complete replacement. But I totally agree with your points
I think this is definitely worth spiking out. Having another integration option that is already more compatible with Backstage seems appropriate, but also considering @mvaltas' reservations, part of the decision might entail a trade of of efficiency or having limitations to how you might fetch and store estimates. @oconnormj, just curious, have you had a chance to explore how this might look without any major updates to the CCF data structure?