SOS icon indicating copy to clipboard operation
SOS copied to clipboard

InsertResult / GetObservation reponse time

Open EnocMartinez opened this issue 4 years ago • 2 comments

Hi,

I'm converting data from a set of CSV files to O&M and Inserting them with InsertResult (using ComplexObservation). I've found that it takes a huge amount of time to insert them to the database. For instance adding 45 minutes of data (about 2700 observations with timestamp, lat, long and 6 other values) takes approximately 2 minutes using a SOS deployed in my desktop computer with an SSD.

GetObservation response time for the same amount of data is also slow, about 12 seconds to retrieve 2700 observations (also using an SSD). If I query directly the database it takes only 8 milliseconds.

If I try to add a day worth of data it will take more than one hour to inject it and about 6.5 minutes to retrieve it. If I need to add / query a month of data it's simply impossible. Is there any way to speed-up injection/retrieval processes? Is there some bug causing slow responses for large time periods?

I've also tried using 6 separated OM_Measurement instead of a ComplexObservation and the results are more or less the same. I'm using version SOS version 4.4.14, postgresql 10.

Attached you can find the files to replicate this issue (InsertSensor, InsertResultTemplate, InsertResult and getObservation) . transactions.zip

EnocMartinez avatar May 25 '20 10:05 EnocMartinez

Hi,

in the SOS 4.4.x several extensions resulted in a SQL query being sent to the db for each inserted observation to prevent duplicate data. But this "check" can be disabled via the setting "Should this SOS check for duplicated observations in the REQUEST?" located in the "service" tab. If you can guarantee that no duplicate observation would be insertedm you can disable the flag. In the new SOS 5.x database model this check is done again by the database constraints.

Querying directly the database is one part of a GetObservation request,therefore this cannot be compared. In the SOS we use the Hibernate framwork to support several database managment systems and the queries (with required joins) are generated under the hood. This leads to other queries that can be a bit more complex as if I query the observation table directly. Furthermore the data from the database must be processed and formatted to the response format (e.g. XML). Depending on the used client/browser to send the GetObservation, the response would be "pretty printed" which also sometimes takes a while.

The injection process can be speed by disabling the flag I mentioned before. For the retrieval of large amounts of data we recommend to use the GetResult operation because this reduces the XML processing of the response and the responses are much smaller.

CarstenHollmann avatar May 27 '20 12:05 CarstenHollmann

Hi Carsten, I've done some more tests with and without the "Should this SOS check for duplicated observations in the REQUEST?" flag. I've tested the versions 4.4.5, 4.4.14 and 5.1.0. All tests are in the same desktop pc with an ssd, transactions are in localhost and each webapp in tomcat has up to 2 GB of RAM (when using a VM it's a lot slower).

SOS Version: 4.4.5 Check Duplicated: ON InsertResult: 44.32 s GetObservation: 2.54 s GetResult: 1.198 s

SOS Version: 4.4.5 Check Duplicated: OFF InsertResult: 42.87 s GetObservation: 3.47 s GetResult: 1.271 s

SOS Version: 4.4.14 Check Duplicated: ON InsertResult: 1m 6.3 s GetObservation: 3.47 s GetResult: 1.2 s

SOS Version: 4.4.14 Check Duplicated: OFF InsertResult: 41.65 s GetObservation: 3.35 s GetResult: 1.208 s

SOS Version: 5.1.0 Check Duplicated: n/a InsertResult: 9m 0.72 s GetObservation: 11.16s GetResult: 6.16 s

For us this is a blocking issue since we are working with datasets that can be several months long (the test datasets is only 25 minutes long). The SOS webapp takes days to inject them to the SOS database and hours to retrieve each dataset, even if we split it into smaller requests.

Is there any other tweak to speed it up? Is it possible to bypass the hibernate framework? Maybe some postgresql settings may be adjusted to squeeze more performance from the database?

Thanks

EnocMartinez avatar Jun 09 '20 14:06 EnocMartinez