usaspending-api
usaspending-api copied to clipboard
[PIPE-315] Implement Recipient Lookup ETL with Spark
Description: Implemented Recipient Lookup with Spark and made updates to test cases.
Technical details:
- Implemented the Recipient Lookup ETL which required increasing the allocated memory in the MAKE command used to run the ETL locally
- Implemented test cases for Recipient Lookup
load_query_to_delta
command; first step in the test is to make sure initial load matches and then validate the MERGE INTO picks up new changes correctly - Updated test cases to include the
hive_unittest_metastore_db
fixture; without this fixture I was running into issues where tests would fail locally if not run in a specific order - Transaction and Award Search were updated temporary to pull from
test.recipient_lookup_testing
since that version is a direct copy of the Recipient Lookup table; if PIPE-373 is not deployed in the same sprint then the Nightly Pipeline will need to be updated to createtest.recipient_lookup_testing
temporarily
Requirements for PR merge:
- [x] Unit & integration tests updated
- [x] API documentation updated
- [ ] Necessary PR reviewers:
- [ ] Backend
- [x] Matview impact assessment completed
- [x] Frontend impact assessment completed
- [ ] Data validation completed
- [ ] Appropriate Operations ticket(s) created
- [ ] Jira Ticket PIPE-315:
- [x] Link to this Pull-Request
- [x] Performance evaluation of affected (API | Script | Download)
- [ ] Before / After data comparison
Area for explaining above N/A when needed:
Adding Do Not Merge briefly while I try to figure out a data difference discovered when running in QAT