congress-legislators icon indicating copy to clipboard operation
congress-legislators copied to clipboard

Historical Committee assignments

Open alexanderfurnas opened this issue 11 years ago • 21 comments

Great work here, such an excellent source. I was curious about the possibility of keeping historical committee assignments for legislators from their previous terms. As I understand only current committee assignments are housed here. Anyone have thoughts on this?

alexanderfurnas avatar Mar 14 '13 16:03 alexanderfurnas

It's on my list - I have assignments from the 105th congress onward in the NYT data, but only 111th-present are in the API and vetted. But these should be coming.

dwillis avatar Mar 14 '13 16:03 dwillis

That's great - Derek, if you do that, and wouldn't mind updating this thread, I'd be happy to do the legwork of importing them into our data here.

On Thu, Mar 14, 2013 at 12:16 PM, Derek Willis [email protected]:

It's on my list - I have assignments from the 105th congress onward in the NYT data, but only 111th-present are in the API and vetted. But these should be coming.

— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/congress-legislators/issues/46#issuecomment-14912061 .

Developer | sunlightfoundation.com

konklone avatar Mar 14 '13 16:03 konklone

Fantastic. Thanks for the response Derek.

alexanderfurnas avatar Mar 14 '13 18:03 alexanderfurnas

The Senate Calendar includes a listing of committee assignments, and is available from fdsys as far back as 1996 (although it's PDF only prior to the 105th Congress).

http://www.gpo.gov/fdsys/browse/collection.action?collectionCode=CCAL&browsePath=107%2FSCAL%2F2002-11%2F11-20%5C%2F4%3BFINAL&isCollapsed=false&leafLevelBrowse=false&isDocumentResults=true&ycord=0

schmod avatar Apr 18 '13 15:04 schmod

Yep, a good resource, although those are the "final" rosters and don't reflect changes made during the course of each congress, which ideally we'd like to have.

dwillis avatar Apr 18 '13 15:04 dwillis

Aha, got it.

schmod avatar Apr 18 '13 17:04 schmod

Hm. You could step through hearing reports on FDSys, which all have the supposedly-then-current committee membership attached to them.

Parsing actually might be fairly easy (as far as these things go), as the GPO put the committee membership in the XML metadata for each document.

schmod avatar Apr 18 '13 17:04 schmod

Just wanted to check on this issue, with Ed Markey moving from the house to the senate today. It would be nice for the data to reflect that in his commitee memberships. For my needs, I don't care about past data so much as current changes, and maybe the 112th congress. It'd be nice to get at least that much.

jasonab avatar Jul 17 '13 00:07 jasonab

We update current committee assignments using the committee_membership.py script. I've just run it, see f15f12d.

JoshData avatar Jul 17 '13 14:07 JoshData

There's a wealth of historical committee membership data here: http://web.mit.edu/17.251/www/data_page.html#2%29

Pros:

  • It does record mid-session changes to rosters
  • Easily readable data for all historical congresses (csvs w/ icpsr id keys)

Cons:

  • Doesn't include subcommittees, only parent committees
  • Large time lag, certainly not a replacement source for updating current data (just a one-off historical update)

Given that the current-committees data has higher granularity (sub committees), is it worth scraping and preserving this data for historical committee membership?

bchartoff avatar Aug 22 '13 17:08 bchartoff

Have to be a little careful with that data. Some is listed as for academic use only.

JoshData avatar Aug 23 '13 23:08 JoshData

If anybody wants to brute-force this, Robert Byrd compiled one of the more comprehensive listings of old committees (and their chairpeople, but not members) that I've seen. The Senate historian seems to be keeping the list up to date.

Full membership information is available in the congressional directory, which has been published continuously since 1820. Scanned copies should be available from archive.org. Good luck getting that data into a structured format though...

Charles Stewart's data from the 1st-79th congresses does not have the academic-only disclaimer, but he does request a citation. (If you want his data served to you on a dead tree, you can apparently also buy the thing as a 4,000 page printed volume). I'm pretty sure that CQ also has a fairly comprehensive database of this information, locked away somewhere.

schmod avatar Aug 27 '13 14:08 schmod

Oh, and the Wikipedians have compiled a good listing of resources for researching historical committee information....

schmod avatar Aug 27 '13 14:08 schmod

If a link in our README would suffice as a citation, I don't have a problem with that.

On Tue, Aug 27, 2013 at 10:49 AM, schmod [email protected] wrote:

If anybody wants to brute-force this, Robert Byrd compiledhttp://books.google.com/books?id=PeHByMYxVm8C&printsec=frontcover&dq=isbn:0160632560&hl=en&sa=X&ei=xq4cUu_EEqi9sASwz4GQDA&ved=0CC8Q6AEwAA#v=onepage&q&f=falseone of the more comprehensive listings of old committees (and their chairpeople, but not members) that I've seen. The Senate historian seems to be keeping the list up to datehttp://www.senate.gov/artandhistory/history/resources/pdf/CommitteeChairs.pdf .

Full membership information is available in the congressional directory, which has been published continuously since 1820. Scanned copies should be available from archive.org. Good luck getting that data into a structured format though...

Charles Stewart's data from the 1st-79th congresses does not have the academic-only disclaimer, but he does request a citation. (If you want his data served to you on a dead tree, you can apparently also buy the thing as a 4,000 page printed volumehttp://books.google.com/books?id=J4JPMQEACAAJ&dq=isbn:1568021712&hl=en&sa=X&ei=UrAcUpCdI_Si4AP7u4B4&ved=0CDgQ6AEwAg). I'm pretty sure that CQ also has a fairly comprehensive database of this information, locked away somewhere.

— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/congress-legislators/issues/46#issuecomment-23342377 .

Developer | sunlightfoundation.com

konklone avatar Aug 27 '13 14:08 konklone

I'm w/ @konklone on README citation. I've also had zero luck getting CQ data in the past, they hold onto it pretty tight.

bchartoff avatar Aug 27 '13 15:08 bchartoff

Maybe somebody should send Charles Stewart an email as a courtesy?

schmod avatar Aug 27 '13 15:08 schmod

Agreed. And we should invite him to join Github and help us out!

konklone avatar Aug 27 '13 15:08 konklone

I can take a stab at revisiting this. Senate calendar from FDSys still seem like a good place to start? @schmod, where is the XML metadata that has the content of the committee memberships that you referenced a few months ago? Can't locate it just poking around.

Also getting lots of dead links for commands like this:

fdsys --year=2009 --store=text,xml --collections=CCAL

e.g.

Downloading: data/fdsys/CCAL/2009/CCAL-111scal-2009-10-30/document.xml
file not found: http://www.gpo.gov/fdsys/pkg/CCAL-111scal-2009-10-30/xml/CCAL-111scal-2009-10-30.xml

Most of GPO site seems to be active. Any ideas?

wilson428 avatar Oct 15 '13 02:10 wilson428

I believe GPO's FDSys is only open for certain high priority collections: https://twitter.com/USGPO/status/384993220536455168

konklone avatar Oct 15 '13 02:10 konklone

Just popping in to say I'm finding this thread helpful in our latest Cong. research, thanks all.

If anyone has a lead on historical member data with subcommittee affiliations, it would be of interest to us, but parent committees are a good start.

Also, I see this has been a recent request again, in issue #522 - maybe this is an area of wider interest for re-use.

davidmooreppf avatar May 15 '18 21:05 davidmooreppf

The Congressional Directory was mentioned earlier, but I was looking it over so I thought I'd post more information:

  • There is plain-text from GPO going back to 1997: https://www.govinfo.gov/app/collection/cdir
  • Some Congresses have more than one update.
  • Recent years seem to only have Senate membership (but we have recent years from other sources).
  • Some years seem to have subcommittees.

I started writing some code before deciding parsing the plain text would be too hard to get done any time soon, but here's some code to pull down the text files:

import json
import urllib.request

def walk_directory(url):
	print(url + "...")
	directory = json.loads(urllib.request.urlopen(url + "?fetchChildrenOnly=1").read().decode("utf8"))
	for node in directory["childNodes"]:
		if node["nodeValue"]["level"] == 3 and node["nodeValue"].get("displayValue", "") != "Committee Assignments":
			# Skip nodes that don't have committee assignments within them.
			pass
		elif "value" in node["nodeValue"]:
			# Recursively go into this node.
			walk_directory(url + "/" + node["nodeValue"]["value"])
		elif re.match("ASSIGNMENTS OF (SENATORS|REPRESENTATIVES) TO COMMITTEES", node["nodeValue"].get("title", "")):
			# This holds committee assignments!
			parse_committee_assignments(node["nodeValue"]["packageid"], node["nodeValue"]["textfile"])

walk_directory("https://www.govinfo.gov/wssearch/rb/cdir")

JoshData avatar Jul 21 '18 23:07 JoshData