api-umbrella icon indicating copy to clipboard operation
api-umbrella copied to clipboard

Data integrity issue under high load

Open profijoeln opened this issue 4 years ago • 1 comments

Tested on "latest" image - nrel/api-umbrella@sha256:094c6dc5e96eff5745ba432437c16579b6cc4b7e6a044de2ad902126e8d21fe1

When system was overloaded, with system load average that exceeds system processors for longer period of time, we started noticing misspelled data in our system. Load average was 10+/4cpu for 5+ minute periods. Errors appear only when system is overloaded.

In load test scenario we send average of 400-600 messages/s to Orion Context-Broker (https://fiware-orion.readthedocs.io/en/2.4.2/), through Umbrella. Loadtest sent payloads were recorded to make sure error was not in sent data.

Sent payload has 2 attributes, "precipitation" and "relativehumidity". But in databases we find following attributes:

pecipitation		LONG
pprecipitation		LONG
prcipitation		LONG
preccipitation		LONG
preciitation		LONG
precipiation		LONG
precipiitation		LONG
precipipitation		LONG
precipitaation		LONG
precipitaion		LONG
precipitatiion		LONG
precipitatin		LONG
precipitatio		LONG
precipitation		FLOAT
precipitationn		LONG
precipitatioon		LONG
precipitaton		LONG
precipitattion		LONG
precipition		LONG
precipititation		LONG
precipittation		LONG
preciptation		LONG
precitation		LONG
precpitation		LONG
prprecipitation		LONG
prrecipitation		LONG
recipitation		LONG
relativehumiditty	LONG
relativehumidity	FLOAT
relativemidity		LONG
rellativehumidity	LONG
rlativehumidity		LONG

We tested the same system with Nginx as reverse-proxy with increased load sent to system but we were not able to reproduce this error with Nginx.

EDIT: When these errors appeared on the system, we also recorded some 502 Errors as response to the sent messages.

Testing was done in Kubernetes environment on Debian. Error is present with and without SSL termination. No system or component crashes during testing.

We expected massive data-loss but we also experienced loss of data integrity.

profijoeln avatar Jan 11 '21 10:01 profijoeln

@profijoeln is there any update about this? Logs or payload?

ccsr avatar Aug 11 '21 07:08 ccsr