fhir-data-pipes
fhir-data-pipes copied to clipboard
Fixed paging to use FHIR standard paging url rather than HAPI-FHIR's
Description of what I changed
Paging for resources from a FHIR server was only implemented to handle HAPI FHIR's custom paging, I switched it to use the more standard "next" link within FHIR's standard.
Appears related to https://github.com/google/fhir-data-pipes/issues/533
E2E test
TESTED:
Ran pipeline with HAPI-FHIR as well as an additional FHIR compliant server to ensure nothing was broken.
Checklist: I completed these to help reviewers :)
-
[X] I have read and will follow the review process.
-
[X] I am familiar with Google Style Guides for the language I have coded in.
No? Please take some time and review Java and Python style guides.
-
[X] My IDE is configured to follow the Google code styles.
No? Unsure? -> configure your IDE.
-
[X] I have added tests to cover my changes. (If you refactored existing code that was well tested you do not have to add tests)
-
[X] I ran
mvn clean package
right before creating this pull request and added all formatting changes to my commit. -
[X] All new and existing tests passed.
-
[X] My pull request is based on the latest changes of the master branch.
No? Unsure? -> execute command
git pull --rebase upstream master
This code works for us locally, we're not sure how to even investigate this build failure. Any pointers or hints would be appreciated.
EDIT: Found a bug, fixing now.
We have fixed our issue, but the build has failed for a different reason. I'm not sure why the request only for Observations would timeout. Could a re-run of the build allow it to pass?
We have fixed our issue, but the build has failed for a different reason. I'm not sure why the request only for Observations would timeout. Could a re-run of the build allow it to pass?
I just replied to this question in my review comments.
BTW, may I ask what FHIR server are you running the pipelines against? In general, can you describe your use-case a little bit (if you don't mind)? The problem as you noted is #533 and that our API based fetch approach has some shortcomings.
We are attempting to integrate the fhir-data-pipes
with a project called medplum.
It isn't based on HAPI FHIR so the _getpages
didn't work.
As for our use case, we are trying to extract FHIR data to place it into Apache Spark so that we can use a tool like Superset to visualize the data and use it as a sort of "data warehouse".
Any hints on this latest failure would be appreciated.
I think it is waiting on approval. @bashir2 to confirm.
Thanks @michealharrington - we'd be keen to learn more about the use case for medplum and share some other work we have for query libraries. Once you have this issue resolved and the pipelines running let us know. Thanks, Fred (PM for Open Health Stack)
Any hints on this latest failure would be appreciated.
The problem seems to be that one patient data is missing from the generated Parquet files. I am not sure why this is happening and triggered an e2e retry to make sure this is not a transient issue. If the problem persists, I will need to take a deeper look later when I review your new changes as well (thanks for the updates, BTW).
Any hints on this latest failure would be appreciated.
The problem seems to be that one patient data is missing from the generated Parquet files. I am not sure why this is happening and triggered an e2e retry to make sure this is not a transient issue. If the problem persists, I will need to take a deeper look later when I review your new changes as well (thanks for the updates, BTW).
So the above issue was indeed transient but there is a real issue caused by this PR. I investigated it a little more and also reviewed your new changes. As I just commented in my review, adding _offset
breaks OpenMRS's FHIR implementation. The real failure can be seen here for example (from this run); the relevant piece is this:
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": 06:02:23.588 [main] INFO c.google.fhir.analytics.OpenmrsUtil c.u.f.rest.client.interceptor.LoggingInterceptor.interceptRequest:83 - Client request: GET http://openmrs:8080/openmrs/ws/fhir2/R4/Observation?_sort=_id&_count=10&_offset=0&_total=accurate&_summary=data HTTP/1.1
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": Exception in thread "main" ca.uhn.fhir.rest.client.exceptions.FhirClientConnectionException: HAPI-1361: Failed to parse response from server when performing GET to URL http://openmrs:8080/openmrs/ws/fhir2/R4/Observation?_sort=_id&_count=10&_offset=0&_total=accurate&_summary=data - java.net.SocketTimeoutException: Read timed out
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": at ca.uhn.fhir.rest.client.impl.BaseClient.invokeClient(BaseClient.java:419)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": at ca.uhn.fhir.rest.client.impl.GenericClient$BaseClientExecutable.invoke(GenericClient.java:541)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": at ca.uhn.fhir.rest.client.impl.GenericClient$SearchInternal.execute(GenericClient.java:1996)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": at com.google.fhir.analytics.FhirSearchUtil.createSegments(FhirSearchUtil.java:235)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": at com.google.fhir.analytics.FhirEtl.buildFhirSearchPipeline(FhirEtl.java:132)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": at com.google.fhir.analytics.FhirEtl.buildPipeline(FhirEtl.java:383)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": at com.google.fhir.analytics.FhirEtl.main(FhirEtl.java:401)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": Caused by: java.net.SocketTimeoutException: Read timed out
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:288)
Step #20 - "Run Batch Pipeline FHIR-search mode with OpenMRS source": at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:314)