sagerx
sagerx copied to clipboard
NDC9 issues related to DailyMed label images
Problem Statement
Representative labeling at the NDC9 level which "counts" for all pack sizes
Maybe account for NDC9 as a last ditch effort?
https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=621062c2-9d8d-4441-8f37-695512cafb35
8 digit NDC9 in the image alt text
I think we need to grab the first two segments from the known SPL NDC (i.e. 72789-422) and then remove the hyphen (i.e. 72789422) in order to search for those matches in the text. Makes me wonder if we should just search for known NDCs of all different formats within the body text of the XML instead of using known NDCs to filter out found NDCs.
Example 1
https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=7db10b76-4520-496e-b450-66a662cc95f6
Example 2
72789422 label
Can confirm this is the first two segments of the NDC. Essentially the NDC9, but missing the zero padding.
925918d5-9f89-4f2f-9bb9-d9c53d90bb0a
Criteria for Success
Come up with a solution to safely account for NDC9 label images.