RTX icon indicating copy to clipboard operation
RTX copied to clipboard

Validator is working again and ARAX results are not valid

Open edeutsch opened this issue 2 years ago • 20 comments

The TRAPI validator is on-line again and showing all the TRAPI problems everyone has. We should address these as soon as we can so the validator is showing green for us. Example: https://arax.ci.transltr.io/?r=140793

"error.knowledge_graph.edge.attribute.type_id.unknown": { "biolink:original_edge_information": [ I think this means that we are adorning our edges with an attribute "biolink:original_edge_information" but this is not a biolink concept.

  1.   "biolink:support_graphs": [
    

Not much we can do about this. I will inquire about this

  1. "error.knowledge_graph.edge.predicate.unknown": { "biolink:decreases_activity_of": [ I don't think this is a valid predicate any more?

  2. "biolink:directly_interacts_with": [ { "edge_id": "NCBIGene:2353--biolink:directly_interacts_with->UniProtKB:P0DPQ6" } ],

  3.  "biolink:entity_negatively_regulates_entity": [
     {
       "edge_id": "FMA:84050--biolink:entity_negatively_regulates_entity->FMA:264827"
     },
     {
       "edge_id": "MESH:D014150--biolink:entity_negatively_regulates_entity->UMLS:C0597357"
     }
    

    ],

  4.  "biolink:entity_positively_regulates_entity": [
    
  5. "biolink:increases_activity_of": [

  6. "error.knowledge_graph.node.categories.not_concrete": { "FMA:82762": [ { "categories": "['biolink:BiologicalEntity']" I think we're not supposed to use abstract classes?

Can we make a plan to address these 8 items?

edeutsch avatar Jun 01 '23 15:06 edeutsch

I think 3 - 7 will be fixed with the XDTD for KG2.8.0.1c (the current XDTD is still running an old KG2 version).

For 1, we should try to pick out something better than biolink:original_edge_information. This attribute captures the KG2pre edge IDs that a given KG2c edge was created from (for debugging purposes)

I don't understand 8... what's the definition of an abstract class? BiologicalEntity is at depth 2 in the Biolink category tree...

amykglen avatar Jun 05 '23 12:06 amykglen

How about instead of biolink:original_edge_information, we change it to biolink:original_predicate (https://github.com/biolink/biolink-model/blob/ea800f98f41f6e42134011573a4ce60cd39a9151/biolink-model.yaml#L4989)

It is not quite the right concept, but maybe close enough? and will quiet the error?

edeutsch avatar Jun 05 '23 14:06 edeutsch

sounds good to me! I just changed that in master

amykglen avatar Jun 05 '23 15:06 amykglen

thanks!

@chunyuma can you confirm that you are on a path to solving items 3-7?

edeutsch avatar Jun 06 '23 02:06 edeutsch

Regarding 8: BiologicalEntity is a abstract: true https://github.com/biolink/biolink-model/blob/ea800f98f41f6e42134011573a4ce60cd39a9151/biolink-model.yaml#L6770

and I am thinking that this means that there should (must?) not be instances of it.

So I suppose the solution is to update KG2 to not use this category any more. Should we do that? Or ask for more clarification from Sierra et al? Or raise a stink?

edeutsch avatar Jun 06 '23 02:06 edeutsch

Steve will create an issue over in KG2 repo

edeutsch avatar Jun 14 '23 18:06 edeutsch

https://github.com/RTXteam/RTX-KG2/issues/286

saramsey avatar Jun 14 '23 18:06 saramsey

Yes, the new version of xDTD can solve items 3-7 if KG2c v2.8.0.1 has updated those predicates.

chunyuma avatar Jun 14 '23 18:06 chunyuma

great, thanks, let's leave this open until we confirm that Which Drugs May Treat Disease X from UI is showing ARAX results that pass all validation, hopefully as soon as we roll out the next xDTD

edeutsch avatar Jun 14 '23 19:06 edeutsch

Eric where in the code base is the TRAPI validator?

saramsey avatar Jul 10 '23 03:07 saramsey

Regarding 8: BiologicalEntity is a abstract: true https://github.com/biolink/biolink-model/blob/ea800f98f41f6e42134011573a4ce60cd39a9151/biolink-model.yaml#L6770

and I am thinking that this means that there should (must?) not be instances of it.

So I suppose the solution is to update KG2 to not use this category any more. Should we do that? Or ask for more clarification from Sierra et al? Or raise a stink?

Yes, we are in discussions with Sierra about whether it can be made abstract: false or whatever.

saramsey avatar Jul 10 '23 03:07 saramsey

I think it will not be easy to eliminate use of biolink:BiologicalEntity without having a lot of nodes wind up with category biolink:NamedThing. I suppose with enough curation effort we could eliminate all the resulting NamedThing-ness, but it would take a lot of effort.

saramsey avatar Jul 10 '23 03:07 saramsey

Eric, since Sierra has agreed that the use of biolink:BiologicalEntity in RTX-KG2 is reasonable and suggested that maybe we could continue using it (while changing the resulting validator error to a warning), perhaps in the meantime, until Biolink is changed or the reasoner validator is changed, we could add some code around these two lines of code:

https://github.com/RTXteam/RTX/blob/7e066415b36893f241e125bad3551a268e1eb410/code/ARAX/ResponseCache/response_cache.py#L317-L318

to catch any validator error corresponding to BiologicalEntity and turn it into a warning? @ecwood and I could do the work. (at this point we are just brainstorming ideas; we could code it in a branch and test it out, with the understanding that the workaround could be chucked if you decide it is not the right approach).

saramsey avatar Jul 10 '23 03:07 saramsey

I am thinking that editing the output of the validator dumps() and also the json output would be quite an extensive hack. I think it would be much better to add that exception to the validator itself.

Is this only for KG2, or is Sierra's thinking that using BiologicalEntity should be fine for anyone?

I think it would be best to start a discussion with Sierra and Richard and ask Sierra if she would approve of the validator being updated to include this exception.

What do you think?

edeutsch avatar Jul 10 '23 04:07 edeutsch

(part of the reason that it would be an extensive hack is that editing validation_messages_text would be one tricky thing, but then line 319 gets the json version of the validator output and then decides on PASS, ERROR, FAIL based on the contents of the json, so that would need to be hacked, too. although that would be easier to hack than the text)

edeutsch avatar Jul 10 '23 05:07 edeutsch

Is this only for KG2, or is Sierra's thinking that using BiologicalEntity should be fine for anyone? I think it would be best to start a discussion with Sierra and Richard and ask Sierra if she would approve of the validator being updated to include this exception.

Here is a snippet of what Sierra told Lili: https://github.com/RTXteam/RTX-KG2/issues/286#issuecomment-1623989372

the long answer to your question is that no (from a "best practice" approach - we should really consider loosening the validation error in this case for September), using abstract or mixin classes as node categories is not advised.

ecwood avatar Jul 10 '23 16:07 ecwood

are there any objections to asking Sierra & Richard about adding a formal exception to the validator?

edeutsch avatar Jul 10 '23 16:07 edeutsch

I don't object.

ecwood avatar Jul 10 '23 16:07 ecwood

@edeutsch understood. Thank you. We will take up the issue with Sierra and Richard. I think there is a good chance we can eventually get an exception in there.

In the meantime, please let us know if at some point the errors become sufficiently burdensome/problematic that immediate workarounds (e.g., hacking response_cache or using NamedThing) are on the table.

saramsey avatar Jul 10 '23 18:07 saramsey

great, thanks. Updating the validator can often be quite rapid when there is a clear path forward. I suggest we wait for that until such a time as it becomes clear that it won't happen quickly. Only then should we contemplate hacking the validator output, I feel.

edeutsch avatar Jul 10 '23 21:07 edeutsch