immuni-documentation icon indicating copy to clipboard operation
immuni-documentation copied to clipboard

Why not PostgreSQL?

Open gbartolini opened this issue 4 years ago • 14 comments

I see you are using MongoDB as database backend. I wonder why PostgreSQL was not used, considering also the more favourable licensing model and the fact that you are using Python as a language, which fits very well with Postgres. Thanks and good luck!

gbartolini avatar May 18 '20 21:05 gbartolini

I see you are using MongoDB as database backend. I wonder why PostgreSQL was not used, considering also the more favourable licensing model and the fact that you are using Python as language, which fits very well with Postgres. Thanks and good luck!

Hi, MongoDB is also very bad for security, especially if you do not provide details on how you use and configure it. I remember this story: https://www.wired.com/story/email-marketing-company-809-million-records-exposed-online/ This is related to the question of the transparency of the process and the fact that, as COPASIR affirmed, Immuni will put the citizen's data outside National territory (clashing with the law-decree): are you going to expose citizen's data in possibly insecure databases to people outside National territory? Can you detail where and how citizens' data are stored? This is a scaring National security risk.

vincenzoiovino avatar May 19 '20 06:05 vincenzoiovino

MongoDB is also very bad for security, especially if you do not provide details on how you use and configure it. I remember this story: https://www.wired.com/story/email-marketing-company-809-million-records-exposed-online/

Protecting a database server is a basic security measure, it's not really a reason not to use a particular DBMS IMO. Misconfigurations can happen with any DBMS and primarily depend on human errors. For example, MongoDB listens on localhost only by default. If you expose the database to the world it's because you do that intentionally.

This is related to the question of the transparency of the process and the fact that, as COPASIR affirmed, Immuni will put the citizen's data outside National territory (clashing with the law-decree):

IMHO COPASIR barely has an idea of what they're talking about. They say that the decentralized nature of the architecture necessarily requires the use of a CDN, although it's actually the opposite. Moreover, they say that the high traffic requires a CDN, but a CDN only helps when you can cache the origin content.

In fact, an API that provides dynamic data (as is the case for Immuni) usually cannot leverage caching at a CDN level, so I don't see why a CDN would be needed. (And for certain the CDN is not a storage service...)

The Government in-house companies should have all the hardware capacity to scale the application backend, which is what I understand from the statements of the Ministry of Innovation. I trust them much more than COPASIR's statements, which are contradictory and confusing.

(In case it's not clear, this is a private citizen's opinion, I'm not affiliated with the team.)

matteocontrini avatar May 19 '20 09:05 matteocontrini

MongoDB is also very bad for security, especially if you do not provide details on how you use and configure it. I remember this story: https://www.wired.com/story/email-marketing-company-809-million-records-exposed-online/

Protecting a database server is a basic security measure, it's not really a reason not to use a particular DBMS IMO.

Some databases are known to be more prone to attacks than others and if nobody provides details or rationale behind such choices, how can we judge?

Misconfigurations can happen with any DBMS and primarily depend on human errors. For example, MongoDB listens on localhost only by default. If you expose the database to the world it's because you do that intentionally.

This is related to the question of the transparency of the process and the fact that, as COPASIR affirmed, Immuni will put the citizen's data outside National territory (clashing with the law-decree):

IMHO COPASIR barely has an idea of what they're talking about. They say that the decentralized nature of the architecture necessarily requires the use of a CDN, although it's actually the opposite. Moreover, they say that the high traffic requires a CDN, but a CDN only helps when you can cache the origin content.

The teams should reply to COPASIR and us with documents with deep security analysis and should detail in which physical servers the contents will be stored. Stating that COPASIR has no idea of what they talk about is a serious concern that should be addresses carefully.

In fact, an API that provides dynamic data (as is the case for Immuni) usually cannot leverage caching at a CDN level, so I don't see why a CDN would be needed. (And for certain the CDN is not a storage service...)

The Government in-house companies should have all the hardware capacity to scale the application backend, which is what I understand from the statements of the Ministry of Innovation. I trust them much more than COPASIR's statements, which are contradictory and confusing.

I understand differently. It is written that data should be stored on National territory. Argue why COPASIR's concerns are wrong with proofs. Moreover the "statements" you mention are in a law.

(In case it's not clear, this is a private citizen's opinion, I'm not affiliated with the team.)

If you do not belong to the Team(s) through which documents can you know that no data Will be stored outside National territory? How can you know whether or not COPASIR made its statements based on documents received by the Ministry as it is easy to guess?

vincenzoiovino avatar May 19 '20 10:05 vincenzoiovino

Argue why COPASIR's concerns are wrong with proofs.

I never said they're wrong, but I find that they're quite confusing. This is what they say:

L’architettura decentralizzata richiede necessariamente l’utilizzo di un Content Delivery Network (CDN), unico strumento che consenta di gestire efficacemente la mole di connessioni che si prevede per il funzionamento della App. Questa tecnologia può essere oggi erogata sul territorio nazionale, tuttavia, non essendo al momento disponibile presso aziende italiane,dovrà essere acquisita ricorrendo a società estere, ancora da individuare

Why does the decentralized architecture imply the use of a CDN? Why do they say that a CDN is needed, when there should be alternatives? (And noone ever mentioned the use of a CDN before they did, so I'm skeptical, even if it's of course possible they know more than us.)

Moreover, a CDN is by definition a global network for the delivery of content. Even if there were an Italian company that provided CDN services, data would still transit abroad. You could of course limit the CDN to PoPs in Italy, but this is also possible with all major CDN providers that are out there.

And if the problem is that the provider is not an Italian company, then they should also be worried of the fact that servers, routers, IDS and other equipment in the datacenters will almost surely not be provided by an Italian company.

I think we can agree on the fact than an official reply would save us all this time spent in conjectures :)

matteocontrini avatar May 19 '20 11:05 matteocontrini

Argue why COPASIR's concerns are wrong with proofs.

I never said they're wrong, but I find that they're quite confusing. This is what they say:

L’architettura decentralizzata richiede necessariamente l’utilizzo di un Content Delivery Network (CDN), unico strumento che consenta di gestire efficacemente la mole di connessioni che si prevede per il funzionamento della App. Questa tecnologia può essere oggi erogata sul territorio nazionale, tuttavia, non essendo al momento disponibile presso aziende italiane,dovrà essere acquisita ricorrendo a società estere, ancora da individuare

Why does the decentralized architecture imply the use of a CDN? Why do they say that a CDN is needed, when there should be alternatives? (And noone ever mentioned the use of a CDN before they did, so I'm skeptical, even if it's of course possible they know more than us.)

The point Is that their analysis might be based on info we do not have, that is COPASIR could have had access to internal documents talking about CDNs, data stored in other countries ecc. that we do not have. Otherwise, how do you think they prepared the relation? At time of writing of the COPASIR's relation not even this repository was released.

The law-decree Is clear, data have to stay in the National territory.

vincenzoiovino avatar May 19 '20 14:05 vincenzoiovino

these people are doing this for free

~Hmmm, not really. The app development is funded with public money, of course~ edit: did some research in this issues tracker, it looks like the company is working for free (as stated by @matteocontrini in the comment below, whom I thank for pointing it out). I'm glad Bending Spoons decided to keep the issue tracker open for this repository, even though there's no source code yet - concerns with the adopted architecture and technologies are a legit issues imo, even if they'll have little to no influence on the final product.

I hope these comments will be hidden and flagged as disruptive content, this is no place for aggressive language and pointless personal attacks.

RememberTheAir avatar May 21 '20 06:05 RememberTheAir

Hmmm, not really. The app development is funded with public money, of course.

It's not, Bending Spoons is working for free.

matteocontrini avatar May 21 '20 07:05 matteocontrini

@matteocontrini:

Why does the decentralized architecture imply the use of a CDN? Why do they say that a CDN is needed, when there should be alternatives?

In fact, an API that provides dynamic data (as is the case for Immuni) usually cannot leverage caching at a CDN level, so I don't see why a CDN would be needed.

In the centralised framework, when someone is tested positive, all of their contacts are uploaded on the backend and notified via push app. This makes the load fairly low, since a) only data from positive contacts is uploaded b) they only need to notify people.

On the flip side, in the decentralised approach, every mobile phone will have to continuously download the list of tested positive people (saying people for the sake of simplicity - it's anonymous IDs in reality) - with a significantly increased and unpredictable load on the backend. Moreover, it's all static data - the perfect use case for a CDN.

gbonfiglio avatar May 25 '20 07:05 gbonfiglio

@gbonfiglio thanks. I assumed that the centralised approach involved uploading all the contacts to build the complete contacts graph. It's true anyway that in that case a CDN wouldn't be really useful...

matteocontrini avatar May 25 '20 08:05 matteocontrini

CDNs are generally always useful and always a good idea, for a number of reasons:

  • your api might be local, but the internet is global. if your api is meant for users in iitaly and you host its backend in italy, you don't want to transport attacks from China/Japan/Canada/Australia all the way down to your servers
  • a CDN comes with some types of implicit DDoS protection: if your endpoint is HTTP and expose your server IP, then you will have to deal with - say - UDP floods. If you expose the CDN, UDP floods will become their problem
  • latency and congestion avoidance: it's always a good idea to take traffic out of eyeball networks as soon as possible

As a matter of fact, practically any large scale public API you can think about is behind a CDN

gbonfiglio avatar May 25 '20 09:05 gbonfiglio

Sono perfettamente d'accordo @gbonfiglio, ma come hai letto nei messaggi di questa repository questo progetto è oggetto di molte attenzioni e iperanalizzato, per cui qualcuno sarebbe capace di scatenare l'ennesima, ingigantita e inutile polemica se una CDN fosse usata.

In ogni caso il README, se aggiornato, sembra chiaro:

It only uses public infrastructures located within the national borders. It is exclusively managed by the public company Sogei S.p.A.

matteocontrini avatar May 25 '20 10:05 matteocontrini

Non so se avete notato ma la risposta a quanto sopra è venuta fuori:

;; ANSWER SECTION:
get.immuni.gov.it.	299	IN	CNAME	immuni.gov.it.edgekey.net.
immuni.gov.it.edgekey.net. 21599 IN	CNAME	e33050.e3.akamaiedge.net.
e33050.e3.akamaiedge.net. 19	IN	A	2.17.197.83
e33050.e3.akamaiedge.net. 19	IN	A	2.17.197.98
;; ANSWER SECTION:
upload.immuni.gov.it.	3599	IN	A	217.175.50.249

gbonfiglio avatar May 31 '20 19:05 gbonfiglio

Have you evalueted a Graph Database like Neo4j for managing relationship between contacts? Thannk you.

iorfix avatar Jun 04 '20 09:06 iorfix

@gbartolini I see your perspective/bias about postgres, but even fitst I wonder why a noqsl db has been chosen.

paolodina avatar Jun 24 '20 22:06 paolodina