Support FHIR Extensions in Spark Datasets
Please fill out the below template as best you can.
Description of Issue
I am currently attempting to read in FHIR Bundles from a directory that contains JSON files and then extract certain resource types to Spark Datasets. While Datasets are being successfully created, Extensions that were part of resources in my FHIR bundle are being dropped altogether.
If I am looking at the correct places in code, it seems like lack of Extension support was a conscious decision: https://github.com/cerner/bunsen/blob/e6a58c6d93a40e951428a6ec0b134d0aac527d09/bunsen-core/src/main/scala/com/cerner/bunsen/EncoderBuilder.scala#L191 https://github.com/cerner/bunsen/blob/e3c1d5e9b641679011ffa557474514bc9bee5bae/bunsen-core/src/main/scala/com/cerner/bunsen/SchemaConverter.scala#L36
I would like to be able to create Datasets for FHIR resources that still contain the Extensions from the original resources.
System Configuration
Project Version
Using Bunsen 0.4.9
Steps to Reproduce the Issue
Run this Scala code (or Java equivalent):
object BunsenExample {
def main(args: Array[String]): Unit = {
failBundles()
}
def failBundles(): Unit = {
val conf = new SparkConf()
.setMaster("local[*]")
.set("spark.sql.crossJoin.enabled", "true")
val spark = SparkSession.builder().config(conf).getOrCreate()
val data = Bundles.forStu3().loadFromDirectory(spark, "/path/to/bundles/with/resource/extensions", 2).cache()
val patients = Bundles.forStu3().extractEntry(spark, data, "Patient")
patients.show()
patients.printSchema()
}
}
The patients dataset will not contain the extensions that were originally part of the Patient FHIR resources in the bundle. There does not appear to be a place for extensions to exist in the schema for the Dataset. I verified that the Extensions are being parsed successfully and are accessible through the BundleContainers returned if you run data.collect() and dive into the result.
Expected Outcomes
Add support for Extensions to be included in Datasets when they are created by extracting resources from a collection of FHIR Bundles.
Extensions and Contained resources are now supported in Bunsen 0.5.x, which applies a different paradigm to creating Spark rows from FHIR resources. The Bundles API in this new major version is still much the same, so try loading your data in the latest version to see if you get the support you require.
While Contained resource support was added in Bunsen 0.4.9 I believe, Extensions were known to be more difficult to implement in the earlier way we did things, so I don't think users can expect Extension support will be back-ported.
I tried running an example similar to the one I posted (except using Observations instead of Patients), and I am still not seeing extensions when the resources are extracted from the bundle. Using a debugger, I can see the extensions exist on the resources in the Bundle. I am using com.cerner.bunsen:bunsen-spark-shaded:0.5.4. Is this the correct dependency?
Also, is there R4 support with bunsen 0.5.x? I was unable to find information similar to the information listed here for 0.5.x releases: https://engineering.cerner.com/bunsen/0.4.6/
Can we have an update on this question Matt posed here, please?
From looking at the codebase, it looks like in a24851ba614a13db447650dc5081853db1ee2e93 on the 0.5.0-dev branch deleted the python tests for r4 and removed classes such as FhirEncoders which are still used by the bunsen-r4 sub-project. It looks like R4 has been abandoned in bunsen. Is that right?