bill-status icon indicating copy to clipboard operation
bill-status copied to clipboard

Upcoming BILLSTATUS format change

Open jonquandt opened this issue 2 years ago • 7 comments

Due to a change in our upstream data source, this is an advance notice that there are significant changes to the format of the BILLSTATUS xml files available via the GovInfo API and Bulkdata repository.

The summary of changes below is based on an review of differences between a sample set of existing BILLSTATUS files and their equivalents that use the updated upstream data source. The sample files are available here for additional review, including both the current and future formats. Additional samples can be provided if needed.

The tentative go-live date is ~~the evening of 9/20/2022~~ TBD for new files. We will plan on reprocessing the entirety of the current Congress in the following weeks. Prior Congresses will also be addressed in the future.

BILLSTATUS changes

Based on a review of BILLSTATUS-117hr1 - original | future

Removed elements

  • /billStatus/bill/createDate
  • /billStatus/bill/amendments/amendment[1]/actions/actionTypeCounts
  • /billStatus/bill/summaries/billSummaries/item[1]/updateDate
  • /billStatus/bill/summaries/billSummaries/item[1]/name
  • /billStatus/bill/titles/item[1]/parentTitleType

Changes to existing or new elements

General

  • /billStatus/bill/recordedVotes have moved to individual actions: /billStatus/bill/actions/item[1]/recordedVotes
  • /billStatus/bill/calendarNumbers have been moved to individual actions: /billStatus/bill/actions/item[1]/calendarNumber
  • /billStatus/bill/billNumber -> /billStatus/bill/number
  • /billStatus/bill/billType -> /billStatus/bill/type
  • there is no longer a /billStatus/bill/committees/billCommittees element. Goes from /billStatus/bill/committees directly to /billStatus/bill/committees/item[1]
  • /billStatus/bill/relatedBills/item[1]/latestTitle -> /billStatus/bill/relatedBills/item[1]/title
  • /billStatus/bill/version

actions

  • new /billStatus/bill/actions/item[1]/calendarNumber element

amendments

  • /billStatus/bill/amendments/amendment[4]/createDate removed
  • /billStatus/bill/amendments/amendment[2]/cosponsors/currentCount removed
  • no empty /billStatus/bill/amendments/amendment[2]/titles element
  • /billStatus/bill/amendments/amendment[1]/actions/actionByCounts
  • /billStatus/bill/amendments/amendment[2]/amendments -> /billStatus/bill/amendments/amendment[2]/amendmentsToAmendment
  • new /billStatus/bill/amendments/amendment[2]/amendedTreaty element
  • /billStatus/bill/amendments/amendment[4]/actions/actions/item[1]/committee -> /billStatus/bill/amendments/amendment[4]/actions/actions/item[1]/committees
  • new /billStatus/bill/amendments/amendment[4]/actions/actions/item[1]/recordedVotes element/section
  • /billStatus/bill/subjects/billSubjects has been removed. now /billStatus/bill/subjects/ goes directly to /billStatus/bill/subjects/legislativeSubjects
  • /billStatus/bill/summaries/billSummaries -> /billStatus/bill/summaries/summary[1] (individual summarys are directly under the summaries element`)
  • /billStatus/bill/summaries/billSummaries/item[1]/lastSummaryUpdateDate -> /billStatus/bill/summaries/summary[1]/updateDate (there was an existing /billStatus/bill/summaries/billSummaries/item[1]/updateDate, but the new element value matches the value in lastSummaryUpdateDate)
  • new /billStatus/bill/titles/item[2]/billTextVersionName - example value: Introduced in House
  • new /billStatus/bill/titles/item[2]/billTextVersionCode - example value: IH

jonquandt avatar Aug 10 '22 02:08 jonquandt

Thanks for the heads-up, Jon.

Would it be possible to add something like

<billStatus schema="1.1">

at the top so that until all of the files across all Congresses are updated to match the new format (which it seems like could be never), we can tell which files are updated and which aren't?

JoshData avatar Aug 10 '22 22:08 JoshData

Looking into this.

jonquandt avatar Aug 19 '22 13:08 jonquandt

@JoshData - we are going to be pulling this change from our September release. I will update this issue when we have a new date and a better solution to help demarcate the new vs. old.

jonquandt avatar Sep 06 '22 12:09 jonquandt

Could we get a more larger data sample in the ? Six files isn't quite enough.

rhurst6 avatar Sep 13 '22 15:09 rhurst6

@JoshData @jonquandt It seems that api.congress.gov will have updates taking effect on 9/26/2022. Will this affect the bulk data repository immediately?

achandy avatar Sep 13 '22 15:09 achandy

@achandy -- to my knowledge, the changes should not affect the bulkdata immediately. We have control over when we adopt the new upstream changes.

We can certainly provide additional samples. If you have some suggested BILLSTATUS files, I can include them in our testing and provide them as updated samples. I am anticipating that we will likely be delivering this change in November or December, but will provide more updates when we have a better target date.

jonquandt avatar Sep 13 '22 16:09 jonquandt

Thank you! if you could add these, all from the 117 congress to the data to be tested: HR5376, HR4350, HR2471, SCONRES14

rhurst6 avatar Sep 13 '22 16:09 rhurst6

@rhurst6 - new sample files added. @JoshData - these examples now include a top-level <version>3.0.0</version> element to indicate this is coming from the newer version of the upstream data.

Target delivery date is now starting 12/20/2022, affecting new BILLSTATUS runs first, and then we will reprocess the 117th Congress, followed by prior congresses over a several day period.

jonquandt avatar Nov 25 '22 17:11 jonquandt

Hi @jonquandt, is 12/20/2022 (today) still your target date? We are working to update our processors for this change, but are not quite sure whether/how it will work for the large appropriations bills coming through now.

aih avatar Dec 20 '22 12:12 aih

Yes, the deployment is scheduled for later this evening.

jonquandt avatar Dec 20 '22 12:12 jonquandt

Good morning, Jon can you pinpoint "later this evening" ?

rhurst6 avatar Dec 20 '22 13:12 rhurst6

We will be deploying changes to the system starting at 7pm. I will send an update once our deployment has completed

jonquandt avatar Dec 20 '22 13:12 jonquandt

Thanks Jon!

rhurst6 avatar Dec 20 '22 13:12 rhurst6

Deployment completed successfully and new BILLSTATUS files in the API and bulkdata are using the new format. Over the next several days, we will reprocess the entire 117th Congress, and then proceed to earlier Congresses.

https://api.govinfo.gov/collections/BILLSTATUS/2022-12-21T00:40:00Z?offsetMark=*&pageSize=250&api_key=DEMO_KEY

jonquandt avatar Dec 21 '22 05:12 jonquandt

The 117th Congress BILLSTATUS data has been reprocessed.

jonquandt avatar Dec 27 '22 19:12 jonquandt

The 117th Congress BILLSTATUS data has been reprocessed.

How about the followed XMLs?

  • https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr3684.xml
  • https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr7900.xml
  • https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr4350.xml

I still see billNumber, and billType there instead of number, and type. Am I missing something?

ghost avatar Dec 30 '22 14:12 ghost

The 117th Congress BILLSTATUS data has been reprocessed.

How about the followed XMLs?

* https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr3684.xml

* https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr7900.xml

* https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr4350.xml

I still see billNumber, and billType there instead of number, and type. Am I missing something?

Nope --- looks like we missed a few: https://api.govinfo.gov/collections/BILLSTATUS/2022-01-01T00:00:00Z/2022-12-21T00:00:00Z?offsetMark=*&pageSize=100&api_key=DEMO_KEY&congress=117

16 haven't been reprocessed. We'll look into what happened and get them updated soon.

jonquandt avatar Dec 30 '22 14:12 jonquandt

@syroeshko - the three that you mentioned have been reprocessed. There are still ten that haven't reprocessed, and we are investigating those further.

jonquandt avatar Dec 30 '22 19:12 jonquandt

Thanks a lot!

ghost avatar Dec 31 '22 09:12 ghost

Will you be able to reflect these changes in the user guide soon?

In addition to the changes listed in this issue, it appears that reports is no longer an element of /billStatus/bill/committees/item/activities and is instead in /billStatus/bill/committeeReports, but you may want to confirm this.

MokeEire avatar Jan 26 '23 23:01 MokeEire

In case it's useful, here's the commit in our scraper that updates it to load the new format showing the necessary changes (at least the changes that I've encountered so far):

https://github.com/unitedstates/congress/commit/f5b510a5516bfcf1dd0ffe1b3cb70a4705ace36c

JoshData avatar Feb 05 '23 00:02 JoshData