usaspending-api icon indicating copy to clipboard operation
usaspending-api copied to clipboard

[PIPE-315] Implement Recipient Lookup ETL with Spark

Open sethstoudenmier opened this issue 2 years ago • 1 comments

Description: Implemented Recipient Lookup with Spark and made updates to test cases.

Technical details:

  • Implemented the Recipient Lookup ETL which required increasing the allocated memory in the MAKE command used to run the ETL locally
  • Implemented test cases for Recipient Lookup load_query_to_delta command; first step in the test is to make sure initial load matches and then validate the MERGE INTO picks up new changes correctly
  • Updated test cases to include the hive_unittest_metastore_db fixture; without this fixture I was running into issues where tests would fail locally if not run in a specific order
  • Transaction and Award Search were updated temporary to pull from test.recipient_lookup_testing since that version is a direct copy of the Recipient Lookup table; if PIPE-373 is not deployed in the same sprint then the Nightly Pipeline will need to be updated to create test.recipient_lookup_testing temporarily

Requirements for PR merge:

  1. [x] Unit & integration tests updated
  2. [x] API documentation updated
  3. [ ] Necessary PR reviewers:
    • [ ] Backend
  4. [x] Matview impact assessment completed
  5. [x] Frontend impact assessment completed
  6. [ ] Data validation completed
  7. [ ] Appropriate Operations ticket(s) created
  8. [ ] Jira Ticket PIPE-315:
    • [x] Link to this Pull-Request
    • [x] Performance evaluation of affected (API | Script | Download)
    • [ ] Before / After data comparison

Area for explaining above N/A when needed:

sethstoudenmier avatar Aug 09 '22 03:08 sethstoudenmier

Adding Do Not Merge briefly while I try to figure out a data difference discovered when running in QAT

sethstoudenmier avatar Aug 09 '22 17:08 sethstoudenmier