mychem.info icon indicating copy to clipboard operation
mychem.info copied to clipboard

Consider switching NDC storage type to 'MergerStorage'

Open ravila4 opened this issue 1 year ago • 0 comments

Right now, the NDC plugin is using the RootKeyMergerStorage class, to join documents with duplicate _id (productndc) values.

However, with this method, it seems like a lot of information is duplicated. For example, the query: http://mychem.info/v1/query?q=69168-318&fields=ndc&dotfield=true (using the dotfield parameter helps us see the duplicated data side-by-side) shows that the values in most the fields contain the same information.

  "hits": [
    {
      "_id": "69168-318",
      "_score": 1,
      "ndc._license": [
        "https://www.fda.gov/AboutFDA/AboutThisWebsite/WebsitePolicies/default.htm#linking",
        "https://www.fda.gov/AboutFDA/AboutThisWebsite/WebsitePolicies/default.htm#linking"
      ],
      "ndc.active_ingred_unit": [
        "mg/1",
        "mg/1"
      ],
      "ndc.active_numerator_strength": [
        "81",
        "81"
      ],
      "ndc.applicationnumber": [
        "part343",
        "part343"
      ],
      "ndc.dosageformname": [
        "TABLET",
        "TABLET"
      ],
      "ndc.labelername": [
        "Allegiant Health",
        "Allegiant Health"
      ],
      "ndc.listing_record_certified_through": [
        "20231231",
        "20221231"
      ],
      "ndc.marketingcategoryname": [
        "OTC MONOGRAPH FINAL",
        "OTC MONOGRAPH FINAL"
      ],
      "ndc.ndc_exclude_flag": [
        "N",
        "N"
      ],
      "ndc.nonproprietaryname": [
        "Aspirin 81 mg",
        "Aspirin 81 MG"
      ],
      "ndc.package.ndc_exclude_flag": [
        "N",
        "N",
        "N",
        "N",
        "N",
        "N",
        "N",
        "N",
        "N",
        "N"
      ],
      "ndc.package.package.ndcpackagecode": [
        "69168-318-01",
        "69168-318-03",
        "69168-318-06",
        "69168-318-17",
        "69168-318-50",
        "69168-318-01",
        "69168-318-03",
        "69168-318-06",
        "69168-318-17",
        "69168-318-50"
      ],
      "ndc.package.package.packagedescription": [
        "1 BOTTLE in 1 CARTON (69168-318-01)  > 100 TABLET in 1 BOTTLE",
        "300 TABLET in 1 BOTTLE (69168-318-03) ",
        "1 BOTTLE in 1 CARTON (69168-318-06)  > 120 TABLET in 1 BOTTLE",
        "300 TABLET in 1 BOTTLE (69168-318-17) ",
        "1 BOTTLE in 1 CARTON (69168-318-50)  > 50 TABLET in 1 BOTTLE",
        "1 BOTTLE in 1 CARTON (69168-318-01)  > 100 TABLET in 1 BOTTLE",
        "300 TABLET in 1 BOTTLE (69168-318-03) ",
        "1 BOTTLE in 1 CARTON (69168-318-06)  > 120 TABLET in 1 BOTTLE",
        "300 TABLET in 1 BOTTLE (69168-318-17) ",
        "1 BOTTLE in 1 CARTON (69168-318-50)  > 50 TABLET in 1 BOTTLE"
      ],
      "ndc.package.sample_package": [
        "N",
        "N",
        "N",
        "N",
        "N",
        "N",
        "N",
        "N",
        "N",
        "N"
      ],
      "ndc.package.startmarketingdate": [
        "20141218",
        "20141218",
        "20141218",
        "20141218",
        "20141218",
        "20141218",
        "20141218",
        "20141218",
        "20141218",
        "20141218"
      ],
      "ndc.pharm_classes": [
        "Anti-Inflammatory Agents, Non-Steroidal [CS], Cyclooxygenase Inhibitors [MoA], Decreased Platelet Aggregation [PE], Decreased Prostaglandin Production [PE], Nonsteroidal Anti-inflammatory Drug [EPC], Platelet Aggregation Inhibitor [EPC]",
        "Anti-Inflammatory Agents, Non-Steroidal [CS], Cyclooxygenase Inhibitors [MoA], Decreased Platelet Aggregation [PE], Decreased Prostaglandin Production [PE], Nonsteroidal Anti-inflammatory Drug [EPC], Platelet Aggregation Inhibitor [EPC]"
      ],
      "ndc.product_id": [
        "69168-318_48ea6598-c8a1-4eff-a939-095215e10716",
        "69168-318_30755b32-e67a-4400-9f49-4e75b45d0672"
      ],
      "ndc.productndc": [
        "69168-318",
        "69168-318"
      ],
      "ndc.producttypename": [
        "HUMAN OTC DRUG",
        "HUMAN OTC DRUG"
      ],
      "ndc.proprietaryname": [
        "Aspirin",
        "ASPIRIN"
      ],
      "ndc.proprietarynamesuffix": [
        "Enteric Coated",
        "Enteric Coated"
      ],
      "ndc.routename": [
        "ORAL",
        "ORAL"
      ],
      "ndc.startmarketingdate": [
        "20141218",
        "20141218"
      ],
      "ndc.substancename": [
        "ASPIRIN",
        "ASPIRIN"
      ]
    }
  ]

In the case of the document above, the only fields that contain significantly different values are ndc.listing_record_certified_through and ndc.product_id. Other fields like ndc.proprietaryname and ndc.nonproprietaryname differ only in their capitalization. It was the same case in other documents that I checked manually.

If this is widespread, I think we can should merge the documents using MergerStorage class, which should result in less duplication.

ravila4 avatar Sep 04 '22 23:09 ravila4