RTX Unusual pause in Expansion?

Marc Duby reported an issue where the traffic light table was showing some long KP query times:

You'll notice that all KPs including RTX-KG2 took at least 60 seconds.

BUT, the weird thing is here:

Although I'm not completely certain, I think the corresponding KG2 query is this:

The first odd thing is that it is only 4 seconds. but the second is that it is only 30 results instead of 14692.

It may be that I'm the one who's confused. https://arax.ncats.io/?r=30835

{
  "edges": {
    "e00": {
      "subject": "n00",
      "object": "n01"
    },
    "e01": {
      "subject": "n01",
      "object": "n02"
    }
  },
  "nodes": {
    "n00": {
      "ids": [
        "DOID:14330"
      ],
      "is_set": false,
      "categories": [
        "biolink:Disease"
      ],
      "name": "DOID:14330"
    },
    "n01": {
      "is_set": false,
      "categories": [
        "biolink:NamedThing"
      ]
    },
    "n02": {
      "ids": [
        "DOID:8778"
      ],
      "is_set": false,
      "categories": [
        "biolink:Disease"
      ],
      "name": "DOID:8778"
    }
  }
}

Hmm, so I just ran it again and the same thing happened with the above query.

ARAX didn't send the query for some 80 seconds. And then after sending it, it only took 5 seconds and returned 30 results. but the traffic light says:

ah, I suppose there's a distinction between results and edges, but still, that's a pretty big different.

Something weird is going on, but I'm not quite sure what.

Oct 26 '21 20:10 edeutsch

the number of results is usually tiny when ARAX queries KG2 because ARAX hacks the system by setting is_set=true on most qnodes since it doesn't care about the results (only the knowledge graph), and that makes processing much faster. so this is expected.

and as for the 76 seconds - I think in this case DTD's time was effectively added to KG2's (and the other KPs') times, because DTD doesn't contain any non-blocking code (so as soon as the system starts working on DTD, it won't go back to any other KP threads until DTD is done, which took 65 seconds). it's a bummer that this means the other KPs' times aren't accurate, but not sure of a way around that..

Oct 26 '21 20:10 amykglen

ah okay, that makes sense, thanks. And it might either do DTD before or after KG2? That would be consistent with something else: in the first example above, KG2 launched very quickly but still claimed to take a long time, while in the second case, it didn't launch for over 60 seconds.

great, thanks, I think I understand.

It seems like the way to solve this is to do the DTD work in a separate thread with something like this:

        dtd_thread = threading.Thread(target=self.do_dtd_stuff(), args=(query,mode,))
        # the extra thread wanders off into self.do_dtd_stuff() but then afterwards ends up back here and keeps going
        dtd_thread.start()

        # Need a way to let the main thread get in here to do more stuff, but prevent the extra thread from also wandering
        # in here some time later when it's done with self.do_dtd_stuff(). both threads share all memory.
	if you're not the dtd thread:
		do all the asyncio stuff

        dtd_thread.join()

The part that I always find tricky is the "if you're not the dtd thread" part. After the extra thread completes it continues at dtd_thread.start() and you have to make sure that it doesn't also do the "do all the asyncio stuff". I think I usually resort to some brittle state setting in self that lets the main thread enter the "if" quickly, but then prevents the extra thread from getting in there. The extra thread maybe needs to have an extra delay to make sure that the main thread safely gets into its "if" before the extra thread gets there. I'm not very good at this stuff. But if you can solve that little problem, everything else seems to work well and easily.

Oct 26 '21 21:10 edeutsch

@edeutsch @amykglen is this still relevant or good to close?

Feb 02 '22 06:02 finnagin

sufficiently stale that I think this is good to close.

Jun 26 '24 06:06 edeutsch