pathling icon indicating copy to clipboard operation
pathling copied to clipboard

Extension support for primitive elements

Open johngrimes opened this issue 3 years ago • 14 comments

Our current implementation of extension support does not include extensions on primitive elements.

This will add this support and assess any impacts on query performance.

johngrimes avatar Feb 24 '22 00:02 johngrimes

I feel as though we should deprioritise this until such time as we find a solid use case in need of it.

johngrimes avatar Mar 08 '22 06:03 johngrimes

Dear @johngrimes, I am currently trying to work with extensions. Is this what you mean by "primitive elements"? image (taken from https://simplifier.net/oncology/operation) I was expecting to be able to extract the value of the codeable concept, but the "extension_url" is as far as i can get (extension_enabled = True).

jasminziegler avatar May 23 '23 17:05 jasminziegler

Hi @jasminziegler,

Using this example resource: https://simplifier.net/FirstProfile3/Procedure-Operation-example-1/~json

This expression works for me:

extension('http://dktk.dkfz.de/fhir/StructureDefinition/onco-core-Extension-OPIntention').valueCodeableConcept.coding.display

Returns one row with "palliativ".

johngrimes avatar May 23 '23 19:05 johngrimes

Hi @johngrimes, thanks for your quick reply! I might have confused the pathling fhir-server implementation and the pathling python api - is this functionality also available in the pathling python api?

jasminziegler avatar May 24 '23 07:05 jasminziegler

As of this morning, yes! 🙂

Here is the newly minted documentation on how to do FHIRPath query using the library: https://pathling.csiro.au/docs/libraries/fhirpath-query

I'd love to hear any feedback you might have!

johngrimes avatar May 24 '23 07:05 johngrimes

Awesome, thank you @johngrimes! (We are waiting for the maven artifact and are ready for testing the new and exciting features :) ) edit: @chgl found version 6.2.1 :)

jasminziegler avatar May 24 '23 10:05 jasminziegler

You can use this one: https://central.sonatype.com/artifact/au.csiro.pathling/library-api/6.2.1

johngrimes avatar May 24 '23 10:05 johngrimes

Got it upgraded and installed! Nevertheless, I am getting an AttributeError: 'DataFrame' object has no attribute 'extract'. Not sure what I am missing here bec. according to this example https://pathling.csiro.au/docs/libraries/fhirpath-query, reading in data with "pc.read..." will produce a pyspark DataFrame.

'PATHLING_VERSION': '6.2.1', 'APACHE_SPARK_VERSION': '3.3.2', Python 3.10.10 Scala version 2.12.15

Could you please provide your example that you tested with my sample resource?

jasminziegler avatar May 24 '23 14:05 jasminziegler

Here's the code I used:

from pathling import PathlingContext

pc = PathlingContext.create(enable_extensions=True)

data = pc.read.ndjson("/Users/gri306/Desktop")

result = data.extract("Procedure", columns=[
    "extension('http://dktk.dkfz.de/fhir/StructureDefinition/onco-core-Extension-OPIntention')"
    ".valueCodeableConcept.coding.display"
])

result.show(truncate=False)

The code pc.read.ndjson(...) should return a DataSource. The ndjson method is only one of a number of data source builder methods.

The extract method should return a DataFrame.

johngrimes avatar May 24 '23 21:05 johngrimes

Hi @jasminziegler, just checking back to see if you got it all working.

johngrimes avatar Jun 06 '23 21:06 johngrimes

Hi @johngrimes , thanks for checking back!

We are actually as of now in a hurry to get all our previous operations (without the newly added ones in v. 6.2.1) working with real data from our clinical systems. Due to the huge amount of data, we are facing issues with resources (requires a lot of RAM) - rather a spark issue than a issue on your side. Since we are performing many operations, we are creating tasks of very large size. Next attempt would be to save intermediate tables and see if we can improve performance because we are suspecting that the task graph is being reconstructed from scratch each time we call "a spark action" which results in ever growing task graphs. Happy to hear any ideas on this from your experiences.

After we get this up and running, we will get back to upgrading + testing your new features which we are still excited about and are happy to provide feedback as soon as possible.

jasminziegler avatar Jun 07 '23 08:06 jasminziegler

Hi @jasminziegler,

From the sounds of it, your partitions might be too large.

Would you be able to share your query plan?

df.explain(True)

johngrimes avatar Jun 12 '23 01:06 johngrimes

The query plan is endlessly long - you are right. Also my stage task size is very large. I am trying to implement checkpoints right now, hopefully that is a useful solution. We do not have any Apache Spark expertise so far at our institution so please apologize my off-topic questions!

jasminziegler avatar Jun 12 '23 14:06 jasminziegler

Hi @jasminziegler,

Not a problem at all.

Perhaps we should have a call some time - I would love to hear more about what you are doing, and it might help you save some time solving these problems. Send me an email at [email protected] if you are interested.

johngrimes avatar Jun 13 '23 00:06 johngrimes