mychem.info
mychem.info copied to clipboard
Consider switching NDC storage type to 'MergerStorage'
Right now, the NDC plugin is using the RootKeyMergerStorage
class, to join documents with duplicate _id
(productndc) values.
However, with this method, it seems like a lot of information is duplicated. For example, the query: http://mychem.info/v1/query?q=69168-318&fields=ndc&dotfield=true (using the dotfield parameter helps us see the duplicated data side-by-side) shows that the values in most the fields contain the same information.
"hits": [
{
"_id": "69168-318",
"_score": 1,
"ndc._license": [
"https://www.fda.gov/AboutFDA/AboutThisWebsite/WebsitePolicies/default.htm#linking",
"https://www.fda.gov/AboutFDA/AboutThisWebsite/WebsitePolicies/default.htm#linking"
],
"ndc.active_ingred_unit": [
"mg/1",
"mg/1"
],
"ndc.active_numerator_strength": [
"81",
"81"
],
"ndc.applicationnumber": [
"part343",
"part343"
],
"ndc.dosageformname": [
"TABLET",
"TABLET"
],
"ndc.labelername": [
"Allegiant Health",
"Allegiant Health"
],
"ndc.listing_record_certified_through": [
"20231231",
"20221231"
],
"ndc.marketingcategoryname": [
"OTC MONOGRAPH FINAL",
"OTC MONOGRAPH FINAL"
],
"ndc.ndc_exclude_flag": [
"N",
"N"
],
"ndc.nonproprietaryname": [
"Aspirin 81 mg",
"Aspirin 81 MG"
],
"ndc.package.ndc_exclude_flag": [
"N",
"N",
"N",
"N",
"N",
"N",
"N",
"N",
"N",
"N"
],
"ndc.package.package.ndcpackagecode": [
"69168-318-01",
"69168-318-03",
"69168-318-06",
"69168-318-17",
"69168-318-50",
"69168-318-01",
"69168-318-03",
"69168-318-06",
"69168-318-17",
"69168-318-50"
],
"ndc.package.package.packagedescription": [
"1 BOTTLE in 1 CARTON (69168-318-01) > 100 TABLET in 1 BOTTLE",
"300 TABLET in 1 BOTTLE (69168-318-03) ",
"1 BOTTLE in 1 CARTON (69168-318-06) > 120 TABLET in 1 BOTTLE",
"300 TABLET in 1 BOTTLE (69168-318-17) ",
"1 BOTTLE in 1 CARTON (69168-318-50) > 50 TABLET in 1 BOTTLE",
"1 BOTTLE in 1 CARTON (69168-318-01) > 100 TABLET in 1 BOTTLE",
"300 TABLET in 1 BOTTLE (69168-318-03) ",
"1 BOTTLE in 1 CARTON (69168-318-06) > 120 TABLET in 1 BOTTLE",
"300 TABLET in 1 BOTTLE (69168-318-17) ",
"1 BOTTLE in 1 CARTON (69168-318-50) > 50 TABLET in 1 BOTTLE"
],
"ndc.package.sample_package": [
"N",
"N",
"N",
"N",
"N",
"N",
"N",
"N",
"N",
"N"
],
"ndc.package.startmarketingdate": [
"20141218",
"20141218",
"20141218",
"20141218",
"20141218",
"20141218",
"20141218",
"20141218",
"20141218",
"20141218"
],
"ndc.pharm_classes": [
"Anti-Inflammatory Agents, Non-Steroidal [CS], Cyclooxygenase Inhibitors [MoA], Decreased Platelet Aggregation [PE], Decreased Prostaglandin Production [PE], Nonsteroidal Anti-inflammatory Drug [EPC], Platelet Aggregation Inhibitor [EPC]",
"Anti-Inflammatory Agents, Non-Steroidal [CS], Cyclooxygenase Inhibitors [MoA], Decreased Platelet Aggregation [PE], Decreased Prostaglandin Production [PE], Nonsteroidal Anti-inflammatory Drug [EPC], Platelet Aggregation Inhibitor [EPC]"
],
"ndc.product_id": [
"69168-318_48ea6598-c8a1-4eff-a939-095215e10716",
"69168-318_30755b32-e67a-4400-9f49-4e75b45d0672"
],
"ndc.productndc": [
"69168-318",
"69168-318"
],
"ndc.producttypename": [
"HUMAN OTC DRUG",
"HUMAN OTC DRUG"
],
"ndc.proprietaryname": [
"Aspirin",
"ASPIRIN"
],
"ndc.proprietarynamesuffix": [
"Enteric Coated",
"Enteric Coated"
],
"ndc.routename": [
"ORAL",
"ORAL"
],
"ndc.startmarketingdate": [
"20141218",
"20141218"
],
"ndc.substancename": [
"ASPIRIN",
"ASPIRIN"
]
}
]
In the case of the document above, the only fields that contain significantly different values are ndc.listing_record_certified_through
and ndc.product_id
. Other fields like ndc.proprietaryname
and ndc.nonproprietaryname
differ only in their capitalization. It was the same case in other documents that I checked manually.
If this is widespread, I think we can should merge the documents using MergerStorage
class, which should result in less duplication.