api-umbrella
api-umbrella copied to clipboard
Data integrity issue under high load
Tested on "latest" image - nrel/api-umbrella@sha256:094c6dc5e96eff5745ba432437c16579b6cc4b7e6a044de2ad902126e8d21fe1
When system was overloaded, with system load average that exceeds system processors for longer period of time, we started noticing misspelled data in our system. Load average was 10+/4cpu for 5+ minute periods. Errors appear only when system is overloaded.
In load test scenario we send average of 400-600 messages/s to Orion Context-Broker (https://fiware-orion.readthedocs.io/en/2.4.2/), through Umbrella. Loadtest sent payloads were recorded to make sure error was not in sent data.
Sent payload has 2 attributes, "precipitation" and "relativehumidity". But in databases we find following attributes:
pecipitation LONG
pprecipitation LONG
prcipitation LONG
preccipitation LONG
preciitation LONG
precipiation LONG
precipiitation LONG
precipipitation LONG
precipitaation LONG
precipitaion LONG
precipitatiion LONG
precipitatin LONG
precipitatio LONG
precipitation FLOAT
precipitationn LONG
precipitatioon LONG
precipitaton LONG
precipitattion LONG
precipition LONG
precipititation LONG
precipittation LONG
preciptation LONG
precitation LONG
precpitation LONG
prprecipitation LONG
prrecipitation LONG
recipitation LONG
relativehumiditty LONG
relativehumidity FLOAT
relativemidity LONG
rellativehumidity LONG
rlativehumidity LONG
We tested the same system with Nginx as reverse-proxy with increased load sent to system but we were not able to reproduce this error with Nginx.
EDIT: When these errors appeared on the system, we also recorded some 502 Errors as response to the sent messages.
Testing was done in Kubernetes environment on Debian. Error is present with and without SSL termination. No system or component crashes during testing.
We expected massive data-loss but we also experienced loss of data integrity.
@profijoeln is there any update about this? Logs or payload?