RTX icon indicating copy to clipboard operation
RTX copied to clipboard

lots of RTX-KG2 test triples that are rejected by the SRI testing system

Open saramsey opened this issue 3 years ago • 24 comments

The SRI team is requesting that the file https://raw.githubusercontent.com/RTXteam/RTX/production/code/ARAX/KnowledgeSources/RTX_KG2c_test_triples.json

not contain any biolink-to-biolink edges. I think we will probably want to include a filter when we generate this file, or post-filter it.

saramsey avatar Sep 15 '22 15:09 saramsey

OK, it appears that the remedy is to add a bit of logic to the module

create_csv_of_kp_predicate_triples.py

saramsey avatar Sep 15 '22 15:09 saramsey

I am testing create_csv_of_kp_predicate_triples.py, to see how long it takes to run.

saramsey avatar Sep 15 '22 15:09 saramsey

I am now testing the script create_csv_of_kp_predicate_triples.py on kg2-7-6c.rtx.ai

saramsey avatar Sep 15 '22 15:09 saramsey

Hi @amykglen do you know what the typical running time of this script is?

saramsey avatar Sep 15 '22 16:09 saramsey

no, but I think @acevedol does - she's been taking care of running that for new KG2 versions.

amykglen avatar Sep 15 '22 16:09 amykglen

also note that in line 116 of that script, we need to change v1.2 --> v1.3. I think we forgot to do this after we made TRAPI 1.3 production. https://github.com/RTXteam/RTX/blob/b2aea971a4834b43fe9fd6315bea4f23aecceb46/code/ARAX/KnowledgeSources/create_csv_of_kp_predicate_triples.py#L116

amykglen avatar Sep 15 '22 16:09 amykglen

OK will do.

saramsey avatar Sep 15 '22 16:09 saramsey

also note that in line 116 of that script, we need to change v1.2 --> v1.3. I think we forgot to do this after we made TRAPI 1.3 production.

https://github.com/RTXteam/RTX/blob/b2aea971a4834b43fe9fd6315bea4f23aecceb46/code/ARAX/KnowledgeSources/create_csv_of_kp_predicate_triples.py#L116

OK, I've coded this change on a local repo on kg2-7-6c.rtx.ai. Testing it now.

saramsey avatar Sep 15 '22 23:09 saramsey

Hi @amykglen should this line of code also be converted to v1.3?

https://github.com/RTXteam/RTX/blob/bec2a38bf24b4b35c27ab03ac0a05fcc604ab1ed/code/ARAX/KnowledgeSources/create_csv_of_kp_predicate_triples.py#L85

saramsey avatar Sep 16 '22 15:09 saramsey

that's probably a good idea! I think the main() function doesn't currently run the function that is in, but we may as well fix it.

amykglen avatar Sep 16 '22 15:09 amykglen

So, Richard B. has also advised (via Slack) that there is a second problem, namely, tuples in KG2c_test_triples.json that have either a subject category or object category that is too abstract, like biolink:NamedThing, or a mixin (which is now allowed). He wrote:

Screen Shot 2022-09-16 at 10 31 06 AM

saramsey avatar Sep 16 '22 17:09 saramsey

Hmm, the test triples file still contains nodes whose categories are biolink:PhysicalEssenceOrOccurrent which is evidently now a mixin. Need to filter out....

saramsey avatar Sep 21 '22 22:09 saramsey

OK, in the module create_csv_of_kp_predicate_triples.py, in the function get_kg2c_predicate_triple_examples, there is a rather complex Cypher query,

match (n)-[r]->(m) 
with distinct labels(n) as n1s, type(r) as rel, labels(m) as n2s 
unwind n1s as n1 unwind n2s as n2 
with distinct n1 as node1, rel as relationship, n2 as node2 
where node1 <> "Base" and node2 <> "Base" 
CALL apoc.cypher.run('match (n2:`' + node1 + '`)-[r2:`' + relationship + '`]->(m2:`' + node2 + '`) 
return n2.id as subject, m2.id as object limit 1', {}) 
YIELD value 
return node1, relationship, node2, value.subject as subject, value.object as object

which I think can (and should) be revised to this simpler query:

match (n)-[r]->(m) 
where not (n.id starts with 'biolink:') and not (m.id starts with 'biolink:') 
with distinct n.category as node1, r.predicate as relationship, m.category as node2, 
        min(n.id + '---' + m.id) as mk 
with *, split(mk, '---') as s 
return node1, relationship, node2, head(s) as subject, head(tail(s)) as object

which also contains a WHERE clause to omit triples for which the subject or object ID starts with the biolink: prefix (since such triples would be rejected by the SRI test harness).

saramsey avatar Oct 20 '22 20:10 saramsey

Information about how the URL (for the test triples JSON document ) is provided to the SRI testing harness, can be found in this issue: https://github.com/RTXteam/RTX/issues/1876

basically it is via the ARAX (or RTX-KG2 KP) SmartAPI registration.

saramsey avatar Nov 30 '22 21:11 saramsey

Now, when we run

pytest -s -vvv test_onehops.py --kp_id=rtx-kg2 --one \
           --x_maturity="development" --ara_id=None --teststyle=by_subject

we see this error:

TypeError: TRAPIResponseValidator.check_compliance_of_trapi_response() got an unexpected keyword argument 'message'

../../translator/trapi/__init__.py:275: TypeError
--------------------------------------------------------------- Captured log setup ----------------------------------------------------------------
DEBUG    asyncio:selector_events.py:54 Using selector: KqueueSelector
DEBUG    asyncio:selector_events.py:54 Using selector: KqueueSelector
---------------------------------------------------------------- Captured log call ----------------------------------------------------------------
DEBUG    urllib3.connectionpool:connectionpool.py:1003 Starting new HTTPS connection (1): raw.githubusercontent.com:443
DEBUG    urllib3.connectionpool:connectionpool.py:456 https://raw.githubusercontent.com:443 "GET /NCATSTranslator/ReasonerAPI/v1.3.0/TranslatorReasonerAPI.yaml HTTP/1.1" 200 10181
DEBUG    urllib3.connectionpool:connectionpool.py:1003 Starting new HTTPS connection (1): arax.ncats.io:443
DEBUG    urllib3.connectionpool:connectionpool.py:456 https://arax.ncats.io:443 "POST /beta/api/rtxkg2/v1.3/query HTTP/1.1" 200 23838
-------------------------------------------------------------- Captured log teardown --------------------------------------------------------------
DEBUG    asyncio:selector_events.py:54 Using selector: KqueueSelector
============================================================= short test summary info =============================================================
FAILED test_onehops.py::test_trapi_kps[rtx-kg2#0-by_subject] - TypeError: TRAPIResponseValidator.check_compliance_of_trapi_response() got an unexpected keyword argument 'message'

saramsey avatar Jan 18 '23 19:01 saramsey

As noted in the call, there was a time that some versions of the TRAPIResponseValidator required a message object as input (despite the name of the program/class). Richard has since fixed this in the very latest versions of the validator.

Passing a Response object (containing a message) when it was expecting a message object might lead to this behavior.

edeutsch avatar Jan 18 '23 19:01 edeutsch

OK, this was apparently due to a bug in SRI_testing/translator/trapi/__init__.py. I've sent a proposed bugfix to Richard and Lili.

saramsey avatar Jan 18 '23 19:01 saramsey

@edeutsch thank you, you diagnosed the issue exactly

saramsey avatar Jan 18 '23 20:01 saramsey

For the latest SRI_testing harness to run, we need to

  • make a local one-line modification of the source file SRI_testing/translator/sri/testing/processor.py at L50, namely, so that it looks like: _test_report_database: TestReportDatabase = get_test_report_database(True)
  • Change init.py in SRI_Testing/translator/trapi L279 to validator.check_compliance_of_trapi_response(response=trapi_response['response_json'])

acevedol avatar Jan 18 '23 20:01 acevedol

In order to only test the kg2c test triples file in kg2integration, we use pytest -s -vvv tests/onehop/test_onehops.py --kp_id=rtx-kg2 --x_maturity="development" --ara_id=None

acevedol avatar Jan 18 '23 20:01 acevedol

Failure on raise_subject_entity:

======================================== FAILURES ========================================
_____________________ test_trapi_kps[rtx-kg2#0-raise_subject_entity] _____________________

kp_trapi_case = {'biolink_version': '2.2.11', 'idx': 0, 'kp_id': 'infores:rtx-kg2', 'kp_source': 'infores:rtx-kg2', ...}
trapi_creator = <function raise_subject_entity at 0x111bfe0d0>
results_bag = ResultsBag:
{'location': 'https://raw.githubusercontent.com/RTXteam/RTX/kg2integration/code/ARAX/KnowledgeSources/RTX_...es:rtx-kg2', 'kp_source_type': 'primary'}, 'unit_test_report': <translator.trapi.UnitTestReport object at 0x122c11550>}

    @pytest.mark.asyncio
    async def test_trapi_kps(kp_trapi_case, trapi_creator, results_bag):
        """Generic Test for TRAPI KPs. The kp_trapi_case fixture is created in conftest.py by looking at KP test triples
        These get successively fed into test_trapi_kps.  This function is further parameterized by trapi_creator, which
        knows how to take an input edge and create some kind of TRAPI query from it.  For instance, by_subject removes
        the object, while raise_object_by_subject removes the object and replaces the object category with its
        biolink parent. This approach will need modification if there turn out to be particular elements we want
        to test for different creators.
        """
        results_bag.location = kp_trapi_case['ks_test_data_location']
        results_bag.case = kp_trapi_case
        results_bag.unit_test_report = UnitTestReport(
            test_case=kp_trapi_case,
            test_name=trapi_creator.__name__,
            trapi_version=kp_trapi_case['trapi_version'],
            biolink_version=kp_trapi_case['biolink_version'],
            sources={
                    "kp_source": kp_trapi_case['kp_source'],
                    "kp_source_type": kp_trapi_case['kp_source_type']
            }
        )

        if in_excluded_tests(test=trapi_creator, test_case=kp_trapi_case):
            _report_and_skip_edge(
                "KP",
                test=trapi_creator,
                test_case=kp_trapi_case,
                test_report=results_bag.unit_test_report,
                excluded_test=True
            )
        elif UnitTestReport.has_validation_errors("pre-validation", kp_trapi_case):
            _report_and_skip_edge(
                "KP",
                test=trapi_creator,
                test_case=kp_trapi_case,
                test_report=results_bag.unit_test_report
            )
        else:
>           await execute_trapi_lookup(
                case=kp_trapi_case,
                creator=trapi_creator,
                rbag=results_bag,
                test_report=results_bag.unit_test_report
            )

tests/onehop/test_onehops.py:99:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
translator/trapi/__init__.py:227: in execute_trapi_lookup
    trapi_request, output_element, output_node_binding = creator(case)
tests/onehop/util.py:132: in wrapper
    result = fn(*args, **kwargs)
tests/onehop/util.py:245: in raise_subject_entity
    parent_subject = ontology_kp.get_parent(subject, subject_cat, biolink_version=request['biolink_version'])
translator/sri/testing/util/ontology_kp.py:135: in get_parent
    query_entity = convert_to_preferred(curie, preferred_prefixes)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

curie = 'FMA:72647', allowed_list = ['NLMID']

    def convert_to_preferred(curie, allowed_list):
        """
        :param curie
        :param allowed_list
        """
        j = {'curies': [curie]}
        result = post(NODE_NORMALIZER_SERVER, j)
        if not result:
            return None
>       new_ids = [v['identifier'] for v in result[curie]['equivalent_identifiers']]
E       TypeError: 'NoneType' object is not subscriptable

translator/sri/testing/util/ontology_kp.py:36: TypeError
----------------------------------- Captured log setup -----------------------------------
DEBUG    asyncio:selector_events.py:54 Using selector: KqueueSelector
----------------------------------- Captured log call ------------------------------------
DEBUG    urllib3.connectionpool:connectionpool.py:1003 Starting new HTTPS connection (1): nodenormalization-sri.renci.org:443
DEBUG    urllib3.connectionpool:connectionpool.py:456 https://nodenormalization-sri.renci.org:443 "POST /get_normalized_nodes HTTP/1.1" 200 18
--------------------------------- Captured log teardown ----------------------------------
DEBUG    asyncio:selector_events.py:54 Using selector: KqueueSelector
================================ short test summary info =================================
FAILED tests/onehop/test_onehops.py::test_trapi_kps[rtx-kg2#0-raise_subject_entity] - TypeError: 'NoneType' object is not subscriptable

acevedol avatar Jan 18 '23 20:01 acevedol

When testing one edge, all the tests pass except raise_subject_entity above. It has an issue with the curie FMA:72647 not being allowed and (I think) can't convert it to a preferred curie for some reason.

acevedol avatar Jan 19 '23 17:01 acevedol

one thing to note is that many raise_subject_entity tests might not pass currently because Plover still only does concept subclass reasoning for diseases and drugs, and not other node types. I'm going to be working on expanding that over the next week or two

amykglen avatar Jan 20 '23 17:01 amykglen

Can we close out this issue?

saramsey avatar Nov 30 '23 19:11 saramsey