fhir-data-pipes Fixed paging to use FHIR standard paging url rather than HAPI-FHIR's

Description of what I changed

Paging for resources from a FHIR server was only implemented to handle HAPI FHIR's custom paging, I switched it to use the more standard "next" link within FHIR's standard.

Appears related to https://github.com/google/fhir-data-pipes/issues/533

E2E test

TESTED:

Ran pipeline with HAPI-FHIR as well as an additional FHIR compliant server to ensure nothing was broken.

Checklist: I completed these to help reviewers :)

[X] I have read and will follow the review process.
[X] I am familiar with Google Style Guides for the language I have coded in.

No? Please take some time and review Java and Python style guides.
[X] My IDE is configured to follow the Google code styles.

No? Unsure? -> configure your IDE.
[X] I have added tests to cover my changes. (If you refactored existing code that was well tested you do not have to add tests)
[X] I ran mvn clean package right before creating this pull request and added all formatting changes to my commit.
[X] All new and existing tests passed.
[X] My pull request is based on the latest changes of the master branch.

No? Unsure? -> execute command git pull --rebase upstream master

Aug 21 '23 14:08 LJNIC

This code works for us locally, we're not sure how to even investigate this build failure. Any pointers or hints would be appreciated.

EDIT: Found a bug, fixing now.

Aug 22 '23 19:08 michealharrington

We have fixed our issue, but the build has failed for a different reason. I'm not sure why the request only for Observations would timeout. Could a re-run of the build allow it to pass?

Aug 23 '23 16:08 michealharrington

We have fixed our issue, but the build has failed for a different reason. I'm not sure why the request only for Observations would timeout. Could a re-run of the build allow it to pass?

I just replied to this question in my review comments.

Aug 30 '23 21:08 bashir2

BTW, may I ask what FHIR server are you running the pipelines against? In general, can you describe your use-case a little bit (if you don't mind)? The problem as you noted is #533 and that our API based fetch approach has some shortcomings.

Aug 30 '23 21:08 bashir2

We are attempting to integrate the fhir-data-pipes with a project called medplum. It isn't based on HAPI FHIR so the _getpages didn't work.

As for our use case, we are trying to extract FHIR data to place it into Apache Spark so that we can use a tool like Superset to visualize the data and use it as a sort of "data warehouse".

Aug 31 '23 12:08 michealharrington

Any hints on this latest failure would be appreciated.

Sep 05 '23 19:09 michealharrington

I think it is waiting on approval. @bashir2 to confirm.

Thanks @michealharrington - we'd be keen to learn more about the use case for medplum and share some other work we have for query libraries. Once you have this issue resolved and the pipelines running let us know. Thanks, Fred (PM for Open Health Stack)

Sep 06 '23 01:09 fredhersch

Any hints on this latest failure would be appreciated.

The problem seems to be that one patient data is missing from the generated Parquet files. I am not sure why this is happening and triggered an e2e retry to make sure this is not a transient issue. If the problem persists, I will need to take a deeper look later when I review your new changes as well (thanks for the updates, BTW).

Sep 08 '23 05:09 bashir2

Any hints on this latest failure would be appreciated.

The problem seems to be that one patient data is missing from the generated Parquet files. I am not sure why this is happening and triggered an e2e retry to make sure this is not a transient issue. If the problem persists, I will need to take a deeper look later when I review your new changes as well (thanks for the updates, BTW).

So the above issue was indeed transient but there is a real issue caused by this PR. I investigated it a little more and also reviewed your new changes. As I just commented in my review, adding _offset breaks OpenMRS's FHIR implementation. The real failure can be seen here for example (from this run); the relevant piece is this:

Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 06:02:23.588 [main] INFO  c.google.fhir.analytics.OpenmrsUtil c.u.f.rest.client.interceptor.LoggingInterceptor.interceptRequest:83 - Client request: GET http://openmrs:8080/openmrs/ws/fhir2/R4/Observation?_sort=_id&_count=10&_offset=0&_total=accurate&_summary=data HTTP/1.1
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": Exception in thread "main" ca.uhn.fhir.rest.client.exceptions.FhirClientConnectionException: HAPI-1361: Failed to parse response from server when performing GET to URL http://openmrs:8080/openmrs/ws/fhir2/R4/Observation?_sort=_id&_count=10&_offset=0&_total=accurate&_summary=data - java.net.SocketTimeoutException: Read timed out
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 	at ca.uhn.fhir.rest.client.impl.BaseClient.invokeClient(BaseClient.java:419)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 	at ca.uhn.fhir.rest.client.impl.GenericClient$BaseClientExecutable.invoke(GenericClient.java:541)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 	at ca.uhn.fhir.rest.client.impl.GenericClient$SearchInternal.execute(GenericClient.java:1996)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 	at com.google.fhir.analytics.FhirSearchUtil.createSegments(FhirSearchUtil.java:235)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 	at com.google.fhir.analytics.FhirEtl.buildFhirSearchPipeline(FhirEtl.java:132)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 	at com.google.fhir.analytics.FhirEtl.buildPipeline(FhirEtl.java:383)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 	at com.google.fhir.analytics.FhirEtl.main(FhirEtl.java:401)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": Caused by: java.net.SocketTimeoutException: Read timed out
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 	at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:288)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 	at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:314)

Sep 08 '23 18:09 bashir2

fhir-data-pipes fhir-data-pipes copied to clipboard

Fixed paging to use FHIR standard paging url rather than HAPI-FHIR's

Description of what I changed

E2E test

Checklist: I completed these to help reviewers :)

fhir-data-pipes
fhir-data-pipes copied to clipboard