sagerx icon indicating copy to clipboard operation
sagerx copied to clipboard

502 error when trying to download FDA UNII data

Open jrlegrand opened this issue 1 year ago • 4 comments

Problem Statement

We already have a view that uses DailyMed SPL data via RxNorm data files to go from NDC9 -> inactive ingredients (excipients): int_mthspl_products_to_inactive_ingredients. It is created via the rxnorm DAG as part of the dbt transform task.

One thing that could be improved is normalizing the UNII display name. FDA has the list of all UNIIs and their preferred display name.

Criteria for Success

  • [ ] Create new Airflow DAG to pull FDA UNII data: https://coderx.io/sagerx/source-data/source-data/fda-unique-ingredient-identifiers-uniis (the preferred file to download is the UNII DATA - not UNII List - https://precision.fda.gov/uniisearch/archive/latest/UNII_Data.zip)
  • [ ] Any other general cleanup - I notice that this intermediate table is pulling directly from sagerx_lake, which is a dbt no-no
  • [ ] Create a mart that pulls in int_mthspl_products_to_inactive_ingredients and then joins in the display name from the new FDA UNII DAG via UNII Code
  • [ ] Push this mart to s3 on a monthly basis around the 10th of the month

Additional Information

https://coderx.io/sagerx/source-data/source-data/fda-unique-ingredient-identifiers-uniis

jrlegrand avatar Apr 22 '24 19:04 jrlegrand

Here's a new one...

Trying to download a file via this link https://precision.fda.gov/uniisearch/archive/latest/UNII_Data.zip

... gives you the error below in an Airflow DAG.

requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://kg6boqcpfj.execute-api.us-east-1.amazonaws.com/production/v1/get-latest-archive/unii_data.zip`

jrlegrand avatar Apr 23 '24 02:04 jrlegrand

Tried with the direct AWS URL and got a similar error:

[2024-04-26, 14:45:51 UTC] {standard_task_runner.py:105} ERROR - Failed to execute job 77 for task extract (502 Server Error: Bad Gateway for url: https://kg6boqcpfj.execute-api.us-east-1.amazonaws.com/production/v1/get-latest-archive/unii_data.zip; 2540)

jrlegrand avatar Apr 26 '24 14:04 jrlegrand

Is this a CORS error?

image

jrlegrand avatar Apr 26 '24 14:04 jrlegrand

Emailed FDA to ask for help.

jrlegrand avatar Apr 26 '24 14:04 jrlegrand

image

Haven't tried this yet, but seems like we need to maybe download a .gsrs file from this website (https://gsrs.ncats.nih.gov/#!/release) and then rename the file and unzip (or something)... but it doesn't seem like this site updates the UNII codes monthly-ish like FDA does... sooo... still not a great option.

jrlegrand avatar Jun 06 '24 18:06 jrlegrand

@lprzychodzien - OMG FDA got back to me and solved my issue. It was because I lowercased the file name. FML...

Hi Joey, This is from another colleague at PrecisionFDA. “Also I would recommend Joey to use a properly capitalized filename in the URL, such as: https://kg6boqcpfj.execute-api.us-east-1.amazonaws.com/production/v1/get-latest-archive/UNII_Data.zip I had no issues downloading UNII data with Python script using this address.” Hope it helps.

Best, GSRS team

jrlegrand avatar Jun 28 '24 01:06 jrlegrand

haha always something simple. Does this DAG work now?

lprzychodzien avatar Jun 28 '24 23:06 lprzychodzien