bunsen Support FHIR Extensions in Spark Datasets

Please fill out the below template as best you can.

Description of Issue

I am currently attempting to read in FHIR Bundles from a directory that contains JSON files and then extract certain resource types to Spark Datasets. While Datasets are being successfully created, Extensions that were part of resources in my FHIR bundle are being dropped altogether.

If I am looking at the correct places in code, it seems like lack of Extension support was a conscious decision: https://github.com/cerner/bunsen/blob/e6a58c6d93a40e951428a6ec0b134d0aac527d09/bunsen-core/src/main/scala/com/cerner/bunsen/EncoderBuilder.scala#L191 https://github.com/cerner/bunsen/blob/e3c1d5e9b641679011ffa557474514bc9bee5bae/bunsen-core/src/main/scala/com/cerner/bunsen/SchemaConverter.scala#L36

I would like to be able to create Datasets for FHIR resources that still contain the Extensions from the original resources.

System Configuration

Project Version

Using Bunsen 0.4.9

Steps to Reproduce the Issue

Run this Scala code (or Java equivalent):

object BunsenExample {
  def main(args: Array[String]): Unit = {
    failBundles()
  }

  def failBundles(): Unit = {
    val conf = new SparkConf()
      .setMaster("local[*]")
      .set("spark.sql.crossJoin.enabled", "true")
    val spark = SparkSession.builder().config(conf).getOrCreate()
    
    val data = Bundles.forStu3().loadFromDirectory(spark, "/path/to/bundles/with/resource/extensions", 2).cache()

    val patients = Bundles.forStu3().extractEntry(spark, data, "Patient")
    patients.show()
    patients.printSchema()
  }
}

The patients dataset will not contain the extensions that were originally part of the Patient FHIR resources in the bundle. There does not appear to be a place for extensions to exist in the schema for the Dataset. I verified that the Extensions are being parsed successfully and are accessible through the BundleContainers returned if you run data.collect() and dive into the result.

Expected Outcomes

Add support for Extensions to be included in Datasets when they are created by extracting resources from a collection of FHIR Bundles.

Jan 08 '20 19:01 mtsargent

Extensions and Contained resources are now supported in Bunsen 0.5.x, which applies a different paradigm to creating Spark rows from FHIR resources. The Bundles API in this new major version is still much the same, so try loading your data in the latest version to see if you get the support you require.

While Contained resource support was added in Bunsen 0.4.9 I believe, Extensions were known to be more difficult to implement in the earlier way we did things, so I don't think users can expect Extension support will be back-ported.

Mar 07 '20 18:03 bdrillard

I tried running an example similar to the one I posted (except using Observations instead of Patients), and I am still not seeing extensions when the resources are extracted from the bundle. Using a debugger, I can see the extensions exist on the resources in the Bundle. I am using com.cerner.bunsen:bunsen-spark-shaded:0.5.4. Is this the correct dependency?

Also, is there R4 support with bunsen 0.5.x? I was unable to find information similar to the information listed here for 0.5.x releases: https://engineering.cerner.com/bunsen/0.4.6/

Apr 20 '20 23:04 mtsargent

Can we have an update on this question Matt posed here, please?

Apr 28 '20 15:04 Teej42

From looking at the codebase, it looks like in a24851ba614a13db447650dc5081853db1ee2e93 on the 0.5.0-dev branch deleted the python tests for r4 and removed classes such as FhirEncoders which are still used by the bunsen-r4 sub-project. It looks like R4 has been abandoned in bunsen. Is that right?

Sep 01 '20 11:09 dhallam