usaspending-api icon indicating copy to clipboard operation
usaspending-api copied to clipboard

`broker_subaward_id` in Custom Award Data

Open vdavez opened this issue 1 year ago • 5 comments

Hello—When I try to download subawards from https://www.usaspending.gov/download_center/custom_award_data, I can't seem to download the broker_subaward_id column. I've tried to hack it with the subaward_fsrs_report_id, but it looks like these are non-unique in the database. Any thoughts? Thanks!

vdavez avatar May 10 '23 03:05 vdavez

The subaward downloads definitely don't have broker_subaward_id as it's not in the download_column_historical_lookups.py. Did you mean from a database extract?

dpb-bah avatar May 10 '23 14:05 dpb-bah

Yeah... basically, I don't know how to avoid duplicating subawards in the database. As far as I can tell, the only unique key is the broker_subaward_id but it's not in the subawards downloads. And the database extract only comes out once a month and requires a much larger download than I need for subawards. Does that make sense?

vdavez avatar May 10 '23 14:05 vdavez

Oh you're rebuilding your own subaward table based on the subaward downloads. Ultimately the database download would be the go-to but if you can't and need to find uniqueness, subawards are harder to pin down and it's been a known issue. The subaward_fsrs_report_id is based on the internal ID FSRS assigns to awards so you can't rely on that. Best suggestion then would be using the subaward_number with the prime_award_unique_key.

dpb-bah avatar May 10 '23 16:05 dpb-bah

You've nailed it. That's exactly the challenge I'm running into!

The problem is that even when "unique together", prime_award_unique_key and subaward_number are not unique. And sometimes these are clearly upstream FSRS issues, but I'm never clear about whether it's me or if it's FSRS! 😂

For example, this one clearly has incorrect data in FSRS. But if I went ahead and assumed uniqueness on the prime_award_unique_key, subaward_number, subawardee_uei, and subaward_action_date, it might exclude this one, where maybe it's actually right?

I recognize that a lot of this is an upstream FSRS problem, but I'm trying my best to not make it worse!

Re this: "The subaward_fsrs_report_id is based on the internal ID FSRS assigns to awards so you can't rely on that." Does that mean that FSRS data doesn't have a reliable transaction ID for subawards?

vdavez avatar May 10 '23 17:05 vdavez

Hi, I have been wrestling with the same issue; sub-awards have no uniqueness in the Bulk Award Download (/api/v2/bulk_download/awards/), which causes many issues in collecting/updating local records and aggregation.

None of the previously mentioned combinations can reliably produce a unique key. It is common to have multiples of the same item or PO in the same month on the same sub-award date for the same sub in any given prime award. Which again leaves us with no uniqueness.

There must be something in the source database that makes it work for the public website. Can the sub_award unique key surface in the report?

basztech avatar Sep 13 '23 17:09 basztech