bill-status
bill-status copied to clipboard
Upcoming BILLSTATUS format change
Due to a change in our upstream data source, this is an advance notice that there are significant changes to the format of the BILLSTATUS xml files available via the GovInfo API and Bulkdata repository.
The summary of changes below is based on an review of differences between a sample set of existing BILLSTATUS files and their equivalents that use the updated upstream data source. The sample files are available here for additional review, including both the current and future formats. Additional samples can be provided if needed.
The tentative go-live date is ~~the evening of 9/20/2022~~ TBD for new files. We will plan on reprocessing the entirety of the current Congress in the following weeks. Prior Congresses will also be addressed in the future.
BILLSTATUS changes
Based on a review of BILLSTATUS-117hr1 - original | future
Removed elements
-
/billStatus/bill/createDate
-
/billStatus/bill/amendments/amendment[1]/actions/actionTypeCounts
-
/billStatus/bill/summaries/billSummaries/item[1]/updateDate
-
/billStatus/bill/summaries/billSummaries/item[1]/name
-
/billStatus/bill/titles/item[1]/parentTitleType
Changes to existing or new elements
General
-
/billStatus/bill/recordedVotes
have moved to individual actions:/billStatus/bill/actions/item[1]/recordedVotes
-
/billStatus/bill/calendarNumbers
have been moved to individual actions:/billStatus/bill/actions/item[1]/calendarNumber
-
/billStatus/bill/billNumber
->/billStatus/bill/number
-
/billStatus/bill/billType
->/billStatus/bill/type
- there is no longer a
/billStatus/bill/committees/billCommittees
element. Goes from/billStatus/bill/committees
directly to/billStatus/bill/committees/item[1]
-
/billStatus/bill/relatedBills/item[1]/latestTitle
->/billStatus/bill/relatedBills/item[1]/title
-
/billStatus/bill/version
actions
- new
/billStatus/bill/actions/item[1]/calendarNumber
element
amendments
-
/billStatus/bill/amendments/amendment[4]/createDate
removed -
/billStatus/bill/amendments/amendment[2]/cosponsors/currentCount
removed - no empty
/billStatus/bill/amendments/amendment[2]/titles
element -
/billStatus/bill/amendments/amendment[1]/actions/actionByCounts
-
/billStatus/bill/amendments/amendment[2]/amendments
->/billStatus/bill/amendments/amendment[2]/amendmentsToAmendment
- new
/billStatus/bill/amendments/amendment[2]/amendedTreaty
element -
/billStatus/bill/amendments/amendment[4]/actions/actions/item[1]/committee
->/billStatus/bill/amendments/amendment[4]/actions/actions/item[1]/committees
- new
/billStatus/bill/amendments/amendment[4]/actions/actions/item[1]/recordedVotes
element/section -
/billStatus/bill/subjects/billSubjects
has been removed. now/billStatus/bill/subjects/
goes directly to/billStatus/bill/subjects/legislativeSubjects
-
/billStatus/bill/summaries/billSummaries
->/billStatus/bill/summaries/summary[1]
(individualsummary
s are directly under thesummaries
element`) -
/billStatus/bill/summaries/billSummaries/item[1]/lastSummaryUpdateDate
->/billStatus/bill/summaries/summary[1]/updateDate
(there was an existing/billStatus/bill/summaries/billSummaries/item[1]/updateDate
, but the new element value matches the value inlastSummaryUpdateDate
) - new
/billStatus/bill/titles/item[2]/billTextVersionName
- example value:Introduced in House
- new
/billStatus/bill/titles/item[2]/billTextVersionCode
- example value:IH
Thanks for the heads-up, Jon.
Would it be possible to add something like
<billStatus schema="1.1">
at the top so that until all of the files across all Congresses are updated to match the new format (which it seems like could be never), we can tell which files are updated and which aren't?
Looking into this.
@JoshData - we are going to be pulling this change from our September release. I will update this issue when we have a new date and a better solution to help demarcate the new vs. old.
Could we get a more larger data sample in the ? Six files isn't quite enough.
@JoshData @jonquandt It seems that api.congress.gov will have updates taking effect on 9/26/2022. Will this affect the bulk data repository immediately?
@achandy -- to my knowledge, the changes should not affect the bulkdata immediately. We have control over when we adopt the new upstream changes.
We can certainly provide additional samples. If you have some suggested BILLSTATUS files, I can include them in our testing and provide them as updated samples. I am anticipating that we will likely be delivering this change in November or December, but will provide more updates when we have a better target date.
Thank you! if you could add these, all from the 117 congress to the data to be tested: HR5376, HR4350, HR2471, SCONRES14
@rhurst6 - new sample files added. @JoshData - these examples now include a top-level <version>3.0.0</version>
element to indicate this is coming from the newer version of the upstream data.
Target delivery date is now starting 12/20/2022, affecting new BILLSTATUS runs first, and then we will reprocess the 117th Congress, followed by prior congresses over a several day period.
Hi @jonquandt, is 12/20/2022 (today) still your target date? We are working to update our processors for this change, but are not quite sure whether/how it will work for the large appropriations bills coming through now.
Yes, the deployment is scheduled for later this evening.
Good morning, Jon can you pinpoint "later this evening" ?
We will be deploying changes to the system starting at 7pm. I will send an update once our deployment has completed
Thanks Jon!
Deployment completed successfully and new BILLSTATUS files in the API and bulkdata are using the new format. Over the next several days, we will reprocess the entire 117th Congress, and then proceed to earlier Congresses.
https://api.govinfo.gov/collections/BILLSTATUS/2022-12-21T00:40:00Z?offsetMark=*&pageSize=250&api_key=DEMO_KEY
The 117th Congress BILLSTATUS data has been reprocessed.
The 117th Congress BILLSTATUS data has been reprocessed.
How about the followed XMLs?
- https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr3684.xml
- https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr7900.xml
- https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr4350.xml
I still see billNumber
, and billType
there instead of number
, and type
. Am I missing something?
The 117th Congress BILLSTATUS data has been reprocessed.
How about the followed XMLs?
* https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr3684.xml * https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr7900.xml * https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr4350.xml
I still see
billNumber
, andbillType
there instead ofnumber
, andtype
. Am I missing something?
Nope --- looks like we missed a few: https://api.govinfo.gov/collections/BILLSTATUS/2022-01-01T00:00:00Z/2022-12-21T00:00:00Z?offsetMark=*&pageSize=100&api_key=DEMO_KEY&congress=117
16 haven't been reprocessed. We'll look into what happened and get them updated soon.
@syroeshko - the three that you mentioned have been reprocessed. There are still ten that haven't reprocessed, and we are investigating those further.
Thanks a lot!
Will you be able to reflect these changes in the user guide soon?
In addition to the changes listed in this issue, it appears that reports
is no longer an element of /billStatus/bill/committees/item/activities
and is instead in /billStatus/bill/committeeReports
, but you may want to confirm this.
In case it's useful, here's the commit in our scraper that updates it to load the new format showing the necessary changes (at least the changes that I've encountered so far):
https://github.com/unitedstates/congress/commit/f5b510a5516bfcf1dd0ffe1b3cb70a4705ace36c