dsp-api icon indicating copy to clipboard operation
dsp-api copied to clipboard

GitHub CI problems

Open benjamingeer opened this issue 5 years ago • 19 comments

Build failed on develop:

[ERROR] HttpTriplestoreConnector(akka://org-knora-webapi-e2e-v2-OntologyV2R2RSpec) -
Triplestore responded with HTTP code 500: Query evaluation error:
com.ontotext.trree.util.NotEnoughMemoryForDistinctGroupBy:
Insufficient free Heap Memory 201Mb for group by and distinct, threshold:250Mb, reached 0Mb,SPARQL query was:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX knora-base: <http://www.knora.org/ontology/knora-base#>

CONSTRUCT {
    ?class ?classPred ?classObj ;
        rdfs:subClassOf ?bnode .

    ?bnode rdf:type owl:Restriction ;
        ?cardinalityPred ?cardinalityValue .
}
FROM <http://www.ontotext.com/explicit>
WHERE {
    BIND(IRI("http://www.knora.org/ontology/0001/anything#Nothing") AS ?class)
    
    ?class rdf:type owl:Class .
    
    {
        ?class ?classPred ?classObj .
    } UNION {
        ?class rdfs:subClassOf ?bnode .
        ?bnode rdf:type owl:Restriction ;
            ?cardinalityPred ?cardinalityValue .
    }
}

The error message seems to be incorrect, because there is no GROUP BY or DISTINCT in the query.

benjamingeer avatar Nov 28 '19 12:11 benjamingeer

I also get socket timeouts like this for no apparent reason:

https://github.com/dasch-swiss/knora-api/pull/1403/checks?check_run_id=342255108#step:6:2154

benjamingeer avatar Dec 11 '19 08:12 benjamingeer

I also get the group by and distinct error when running the tests locally. It is always SearchRouteV2R2RSpec. Then after running it a few times it suddenly works.

I'm running with the same memory restriction for GraphDB of 5GB for the heap size.

Yes, running GraphDB with more memory will "solve" the issue of running tests. But more important, will it actually run in production as expected? In production, we will have hopefully more than 1 user at the same time doing these kinds of queries, so how much memory does Gravsearch require to work under those circumstances?

It would be great if you could investigate a bit.

subotic avatar Dec 16 '19 17:12 subotic

I will try to make it happen locally.

benjamingeer avatar Dec 16 '19 18:12 benjamingeer

More errors on GitHub CI that don't seem to have anything to do with memory:

Screenshot 2019-12-17 at 09 45 34

Screenshot 2019-12-17 at 09 46 56

https://github.com/dasch-swiss/knora-api/runs/352081009

benjamingeer avatar Dec 17 '19 08:12 benjamingeer

how much memory does Gravsearch require to work under those circumstances?

The queries that run out of memory on GitHub CI aren't only Gravsearch queries. These errors happen with all sorts of queries, including queries to read just one resource (/v2/resources), and queries to read one resource class (/v2/ontologies/classes), as in the the first example in this issue.

benjamingeer avatar Dec 17 '19 09:12 benjamingeer

Repeated socket timeouts when trying to connect to GraphDB, e.g. for this query to get default permissions:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix knora-admin: <http://www.knora.org/ontology/knora-admin#>

SELECT ?s ?p ?o
FROM <http://www.ontotext.com/explicit>
WHERE {
    ?s rdf:type knora-admin:DefaultObjectAccessPermission .
    ?s knora-admin:forProject <http://rdfh.ch/projects/0001> .
    ?s knora-admin:forResourceClass <http://www.knora.org/ontology/0001/anything#Thing> .
    ?s knora-admin:forProperty <http://www.knora.org/ontology/0001/anything#hasGeometry> .
    ?s ?p ?o .
}

https://github.com/dasch-swiss/knora-api/runs/370753093#step:6:1948

benjamingeer avatar Jan 02 '20 12:01 benjamingeer

It's almost as if GitHub CI just randomly refuses to allocate any CPU cycles to GraphDB.

benjamingeer avatar Jan 02 '20 12:01 benjamingeer

Here are some test results on my 2012 iMac, running the following:

  • webapi / reStart in sbt (not using Docker)
  • GraphDB 9.0.0 with -Xmx5G, started with the bin/graphdb script (not using Docker)
  • Homebrew's redis-server (not using Docker)
  • Sipi started with docker-compose up sipi

Getting a single resource with standoff markup, using wrk, with 8 threads and 64 concurrent connections:

% wrk -t8 -c64 -d30s --timeout=15s http://0.0.0.0:3333/v2/resources/http%3A%2F%2Frdfh.ch%2F0001%2FqN1igiDRSAemBBktbRHn6g
Running 30s test @ http://0.0.0.0:3333/v2/resources/http%3A%2F%2Frdfh.ch%2F0001%2FqN1igiDRSAemBBktbRHn6g
  8 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   641.11ms   60.70ms 828.98ms   87.40%
    Req/Sec    15.87     11.38    60.00     60.99%
  2969 requests in 30.09s, 7.77MB read
Requests/sec:     98.66
Transfer/sec:    264.47KB

A similar test, doing a Gravsearch query:

% wrk -t8 -c64 -d30s --timeout=15s 'http://0.0.0.0:3333/v2/searchextended/PREFIX%20knora-api%3A%20%3Chttp%3A%2F%2Fapi.knora.org%2Fontology%2Fknora-api%2Fv2%23%3E%0APREFIX%20standoff%3A%20%3Chttp%3A%2F%2Fapi.knora.org%2Fontology%2Fstandoff%2Fv2%23%3E%0APREFIX%20anything%3A%20%3Chttp%3A%2F%2F0.0.0.0%3A3333%2Fontology%2F0001%2Fanything%2Fv2%23%3E%0A%0ACONSTRUCT%20%7B%0A%20%20%20%20%3Fthing%20knora-api%3AisMainResource%20true%20.%0A%20%20%20%20%3Fthing%20anything%3AhasRichtext%20%3Ftext%20.%0A%7D%20WHERE%20%7B%0A%20%20%20%20%3Fthing%20a%20anything%3AThing%20.%0A%20%20%20%20%3Fthing%20anything%3AhasRichtext%20%3Ftext%20.%0A%20%20%20%20%3Ftext%20knora-api%3AvalueAsString%20%3FtextStr%20.%0A%20%20%20%20%3Ftext%20knora-api%3AtextValueHasStandoff%20%3FstandoffTag%20.%0A%20%20%20%20%3FstandoffTag%20a%20standoff%3AStandoffItalicTag%20.%0A%20%20%20%20FILTER%20knora-api%3AmatchInStandoff(%3FtextStr%2C%20%3FstandoffTag%2C%20%22interesting%20text%22)%0A%7D'
Running 30s test @ http://0.0.0.0:3333/v2/searchextended/PREFIX%20knora-api%3A%20%3Chttp%3A%2F%2Fapi.knora.org%2Fontology%2Fknora-api%2Fv2%23%3E%0APREFIX%20standoff%3A%20%3Chttp%3A%2F%2Fapi.knora.org%2Fontology%2Fstandoff%2Fv2%23%3E%0APREFIX%20anything%3A%20%3Chttp%3A%2F%2F0.0.0.0%3A3333%2Fontology%2F0001%2Fanything%2Fv2%23%3E%0A%0ACONSTRUCT%20%7B%0A%20%20%20%20%3Fthing%20knora-api%3AisMainResource%20true%20.%0A%20%20%20%20%3Fthing%20anything%3AhasRichtext%20%3Ftext%20.%0A%7D%20WHERE%20%7B%0A%20%20%20%20%3Fthing%20a%20anything%3AThing%20.%0A%20%20%20%20%3Fthing%20anything%3AhasRichtext%20%3Ftext%20.%0A%20%20%20%20%3Ftext%20knora-api%3AvalueAsString%20%3FtextStr%20.%0A%20%20%20%20%3Ftext%20knora-api%3AtextValueHasStandoff%20%3FstandoffTag%20.%0A%20%20%20%20%3FstandoffTag%20a%20standoff%3AStandoffItalicTag%20.%0A%20%20%20%20FILTER%20knora-api%3AmatchInStandoff(%3FtextStr%2C%20%3FstandoffTag%2C%20%22interesting%20text%22)%0A%7D
  8 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   865.06ms  809.03ms   3.10s    83.76%
    Req/Sec    20.43     12.00    60.00     79.45%
  2605 requests in 30.10s, 6.70MB read
Requests/sec:     86.56
Transfer/sec:    227.98KB

If I don't get any errors with 64 concurrent requests, why does GitHub CI get errors with one concurrent request?

benjamingeer avatar Jan 02 '20 14:01 benjamingeer

@subotic What do you think might explain this error on GitHub CI?

Screenshot 2020-02-26 at 11 48 58

https://github.com/dasch-swiss/knora-api/pull/1602/checks?check_run_id=467290209

I don't think it could be GraphDB's fault. How could GraphDB prevent Knora from binding to port 3333?

benjamingeer avatar Feb 26 '20 10:02 benjamingeer

And again:

Screenshot 2020-02-26 at 18 50 59

https://github.com/dasch-swiss/knora-api/pull/1602/checks?check_run_id=470308761

benjamingeer avatar Feb 26 '20 17:02 benjamingeer

I think that Akka is not shutting down fast enough in moments of scarce resources and that the next test already tries to start the server again.

subotic avatar Feb 27 '20 05:02 subotic

For some time now, I have the following design for running tests in my mind, namely to run the "E2E" and "Integration" tests against an already running knora-stack. This means before these tests are run, the knora-stack is started with something like make stack-up-ci (or alternatively each part of the stack by hand).

This would allow using these tests as we use them now, but also to point them to any kind of knora-stack installation, and have it thoroughly checked (which we need but are currently missing), solve the BindException problem, and make the tests run a bit faster.

What do you think?

subotic avatar Feb 27 '20 07:02 subotic

Yes, I think that's a good idea. We would just have to make sure to empty the triplestore and reload the test data before each test.

also to point them to any kind of knora-stack installation, and have it thoroughly checked (which we need but are currently missing)

OK but keep in mind that many tests change the contents of the repository.

benjamingeer avatar Feb 27 '20 08:02 benjamingeer

But it wouldn't solve the BindException problem in the unit tests. Here I got it in JsonLDUtilSpec.

benjamingeer avatar Feb 27 '20 08:02 benjamingeer

Why not run the unit tests the same way?

benjamingeer avatar Feb 27 '20 08:02 benjamingeer

Or would it be possible to run the unit tests the same way?

benjamingeer avatar Feb 27 '20 08:02 benjamingeer

I guess, then they are not unit tests ;-)

Let's rephrase it then. Any tests that currently start knora-api should afterward run against an externally started knora-stack.

subotic avatar Feb 27 '20 08:02 subotic

I've added #1611

subotic avatar Feb 27 '20 08:02 subotic

I guess, then they are not unit tests ;-)

We have a lot of tests that test responders, and these need a triplestore, but they're not E2E tests. I think all these tests extend CoreSpec, which starts the actor system, the responder manager, etc.

I've just noticed that I can change JsonLDUtilSpec so it doesn't extend CoreSpec. Maybe there are some other unit tests that can be changed that way, too.

benjamingeer avatar Feb 27 '20 09:02 benjamingeer