courtlistener
courtlistener copied to clipboard
Write up tutorials/examples for the API
I'm struggling a bit with this, but @johnhawkinson is constantly telling me we need it and I mostly agree. I think I'm too close to the fire and thus can't really see what people need. I do think that a fresh examples page or tutorial for the API would be good though.
I started some of this in branch 1377-api-examples-tutorials-docs.
I'm not actively working on this. If somebody else wants to, I'd very much welcome it.
I feel like I gave you some good examples recently, like:
1: [Redacted] wanted to convert the PACER docket report's list of parties/attorneys into a spreadsheet. How would she do this with the CourtListener API?
One part of the answer was to use
https://www.courtlistener.com/api/rest-info/ https://www.courtlistener.com/api/rest/v3/attorneys/?docket__id=4274130 https://www.courtlistener.com/api/rest/v3/parties/?docket__id=4274130
And the API example page doesn't explain this, nor the whole __
business. Or is it _
?
2: Pagination inhibits people from using the API interactively/casually
The pagination is a necessary evil,
Perhaps there should be a one-liner indicating how to obtain all the paginated info in perl or python or something. (I realize python does not lend itself well to one-liners so perl is put forth as an aspirational joke).
The double underscore thing is explained in great detail, FWIW:
https://www.courtlistener.com/api/rest-info/#filtering
(search for RelatedFilters
.)
Perhaps it needs to be elevated or deserves a deeper example though if you've missed it.
I think part of the issue is that the API docs are just too daunting at this point. I actually like API docs that are on a single page, but perhaps breaking them up would help somehow. Or maybe they just need tidier organization.
Somebody just asked via email how to get the Criminal Count information. Again, the answer is a little like your example above:
https://www.courtlistener.com/api/rest/v3/parties/?docket=17318376
Hi everyone, I would like to begin actively work on this issue. This may take a significant time until I can provide useful contributions as a junior programmer, though I look forward to consistent hard work over time to get up to speed. I hope that previous experience in technical writing can be a useful help to this project. Please let me know if there is any information I should know. In the meantime, I will read the existing documentation and search for the 1377-api-examples-tutorials-docs.
That's great, Adam! I'd recommend starting a Google Doc or similar that you can share. The other part of this is coming up with good examples of things people want to do with the API, so maybe just start with some basic use case examples, before you write too much, and we can build on those?
Hi Mike, thank you for your guidance! I've started an initial Google Doc to introduce ideas: https://docs.google.com/document/d/1i5ZJy9wyD-BI1CS11Ew2Y8B5pPEinTNGsGI5_e1XAt0/edit?usp=sharing
The document includes two sections: requests from GitHub (rephrased from this webpage), and requests from Discourse (taking one of the most highly viewed posts at https://flp.discourse.group/t/how-do-i-get-the-docket-entries-for-a-docket/20).
In addition, I have a couple questions on my end:
- I created an account, but wouldn't have permissions to access the Attorney List (and it looks like I don't have permission to access the Docket Entry List, though this could be due to an error on my end). Would my account be able to have access to this (I can send you my personal information via Slack)?
- What level of experience should we assume the user to have? The current documentation at https://www.courtlistener.com/api/rest-info/ assumes some experience, though I can write at an even more simplified level for more beginner programmers. (In this case, perhaps my inexperience can truly be an asset, haha.)
Let me know if these would be good starting points, or if you would like any adjustments at all.
Thanks Adam. Taking a small step back, I wonder if it's worth just doing some super basic stuff from the documentation. Just skimming through it, I see a lot of cruft that's not really best practice. For example, a tutorial could just:
- There are a lot of endpoints, but today we're looking at courts (this is probably a good one because it's small, fully open and easy to play with).
- The first thing to do is to GET the results. To do that, do just use Python to issue a GET request on the endpoint...
- Note that you get back JSON when you do that. JSON is the default if you don't provide the
Accept
header. You can also do a GET request by loading the URL in your browser. In that case, you're still using the API, but your browser will sendAccept
headers for HTML, so you're getting the JSON data wrapped in HTML. But note that your browser is using the API just like Python was! - You'll note that I didn't log in. This is to make getting started easier, but let's get this fixed now. To log in, use Token authentication.
- Now, if you go to your profile while logged into CourtListener, you can see how many requests you've made today. You'll see that it only shows one. This is because your first request was from before you logged in and was anonymous.
- Great, so that's how to authenticate and get results. Let's play with filters and ordering.
- This API is self-documenting. So the easiest way to see what filters and ordering are available is to make an OPTIONS request. You can do that with Python by... or by clicking the button on the API webpage.
- When you look at the results to that request, you'll see that courts can be ordered by XYZ. Let's just experiment with changing the order.
- You can also see the filters that are on the courts endpoint by looking at the OPTIONS response. Let's filter like XYZ.
- Then do a filter using the double underscore
I think this would provide a lot of utility to folks and cover a bunch of the examples you provided. (Thank you for gathering them, they provided clarity.)
It also seems like it'd give the top info from the API docs in a more readable format than having to read through all the stuff.
Once this was launched, we could get into more tutorials about the specific data types, but this one should probably try to shy away from any data-specific questions.
Does that sound good?
A note on the double underscore...
The double underscore lets you filter across object types. So if you want to filter dockets by what court they're in, you'd query dockets (that's the thing you want back, right?) and then you'd filter by courts and the name of the court. It'd be something like:
/dockets/?court__id=scotus
See how that filters the dockets by a value that's in another field?
(Using the ID for this is a bad plan because it joins in the Court table unnecessarily though. You should really do this if you want that same outcome:
/dockets/?court_id=scotus
It works this way because the docket table has a column for court_id
in each row that it uses to join with the id
column of the courts table. So you could ask for all dockets that join to the courts table, where the courts table has an id
of scotus
, but it's much easier to just ask for all dockets with a court_id
of scotus
.)
Thank you for your review and in-depth thoughts Mike, I appreciate the information. I confirm I've closely read each paragraph, such as the tutorial on how to use the CourtListener API to access information on courts and filter the data.
This approach sounds great, and the double underscore explanation makes sense. From my initial understanding, it looks like /dockets/?court_id=scotus directly queries all dockets with a court_id of scotus, whereas /dockets/?court__id=scotus takes a more indirect approach by joining in the full Court table first, and then filtering the dockets to those with the court_id of scotus. I will study this week to improve my level of understanding about databases too, so I can better understand how the software works.
As an initial task with the example tutorial you've provided: would it be useful to work on filling in the blanks where we use Python to make queries to the database? I would be very interested in working on this, though I may not be the best contributor for this if there are time constraints, as it may take some time for me to learn (roughly a couple weeks of study to have a solid grasp on the fundamentals, as an initial estimate).
Great!
From my initial understanding, it looks like /dockets/?court_id=scotus directly queries all dockets with a court_id of scotus, whereas /dockets/?court__id=scotus takes a more indirect approach by joining in the full Court table first, and then filtering the dockets to those with the court_id of scotus.
Yes, that's exactly right.
would it be useful to work on filling in the blanks where we use Python to make queries to the database?
Yes, I think that's an important part of the tutorial to get the Python snippets in there. If you look at the requests
framework, it should be pretty straightforward, I think. Check out HTTP verbs too.
Hi @mlissner , I have found working examples for the first tutorial on how to use the API with Python. Apologies this took a while (just finished a busy period with coursework), though I'm glad that I've found working examples in the end.
You can let me know if you'd like any changes, or if you would like me to work on finding examples for other tutorials. I've added an * symbol for Steps 7 and 10, as these could warrant a closer look. In addition, I included commands such as response.text
to return the text
attribute of the Response
object, though this may be unnecessary to include explicitly.
-
There are a lot of endpoints, but today we're looking at courts (this is probably a good one because it's small, fully open and easy to play with).
-
The first thing to do is to GET the results. To do that, do just use Python to issue a GET request on the endpoint, such as by inputting the commands below using a Python interpreter:
import requests
response = requests.get("https://www.courtlistener.com/api/rest/v3/courts/" )
response.raise_for_status() # If the request succeeded, no message will appear after entering this command.
response.text
A full list of endpoints can be found at https://www.courtlistener.com/api/rest/v3/ .
-
Note that you get back JSON when you do that. JSON is the default if you don't provide the Accept header. You can also do a GET request by loading the URL in your browser. In that case, you're still using the API, but your browser will send Accept headers for HTML, so you're getting the JSON data wrapped in HTML. But note that your browser is using the API just like Python was!
-
You'll note that I didn't log in. This is to make getting started easier, but let's get this fixed now. To log in, use Token authentication.
-
Now, if you go to your profile while logged into CourtListener, you can see how many requests you've made today. You'll see that it only shows one. This is because your first request was from before you logged in and was anonymous.
-
Great, so that's how to authenticate and get results. Let's play with filters and ordering.
*7. This API is self-documenting. So the easiest way to see what filters and ordering are available is to make an OPTIONS request. You can do that with Python with the commands below or by clicking the button on the API webpage.
response = requests.options("https://www.courtlistener.com/api/rest/v3/courts/")
- When you look at the results to that request, you'll see that courts can be ordered by
date_modified
. Let's just experiment with changing the order.
For ordering by date modified (ascending), you can input:
response = requests.get("https://www.courtlistener.com/api/rest/v3/courts/?order_by=date_modified")
For ordering by date modified (descending), you can input:
response = requests.get("https://www.courtlistener.com/api/rest/v3/courts/?order_by=-date_modified")
- You can also see the filters that are on the courts endpoint by looking at the OPTIONS response. Let's filter for only entries with the FS jurisdiction. Enter the command:
response = requests.get("https://www.courtlistener.com/api/rest/v3/courts/", params = {"jurisdiction":"FS"})
*10. You can also filter using the double underscore method, such as by using the following statement:
response = requests.get("https://www.courtlistener.com/api/rest/v3/courts/?dockets__id=65485794")
Comments on Step 7 and 10:
For Step 7: As a user experience report, I experienced friction and was stuck at this step for a while, with the current wording ("So the easiest way to see what filters and ordering are available is to make an OPTIONS request. You can do that with Python by... or by clicking the button on the API webpage").
The conceptual issue I experienced was trying to figure out how to apply ordering by making the OPTIONS request with Python. Though I read "ordering":["id","date_modified","position","start_date","end_date"]
after making the OPTIONS request, I was stuck for a while on how to format this. Eventually, I resolved this by clicking the Filters button on the API webpage, and finding an example way to apply the ordering (as well as how to make the ordering descending).
The example could clarify this; in addition, a potentially useful change could be to specify the label for the button for the GUI option for extra clarity (such as: "You can do that with Python with the example below, or by clicking the "Filters" button on the API webpage").
For Step 10: Though response = requests.get("https://www.courtlistener.com/api/rest/v3/courts/?dockets__id=65485794")
did work to produce a usable example, the specific example might not be useful to know for the user. I attempted to find a more useful example for the double underscore filtering, though I couldn't find one that worked from my initial search. This example in particular could be worth revising, if an alternative could be more illustrative for the user.
Adam, this looks pretty good, but I should note that the sketch I provided was meant to be very notional. If it doesn't feel good to you, don't feel obligated to use it. I just wanted to provide a quick outline.
With that in mind, I have a couple specific thoughts:
-
One of
requests
biggest bugs is that it doesn't provide a timeout by default. We can't fix that bug (I've tried), but we should make sure that all our examples use thetimeout
parameter. So whenever you say:requests.{get|options|post|etc}(some-url)
You should say:
requests.{get|options|post|etc}(some-url, timeout=5)
-
In example 9, you use the
params
argument. I think you should stick with using that or not. So either redo that one as?jurisdiction=FS
or redo the others to useparams
. You should also explain whatFS
means and how to learn what it means (options request). -
I wouldn't point people to the Filters button in the UI. I think it's easier just to say (notionally!): "If you do an options request, you can see ordering. To use an ordering field, do..."
-
Using
response.text
is fine, but you can useresponse.json()
instead and you'll get structured data (handy!).
I think that's it for now. I look forward to the more fleshed out version. Thank you again for sticking with this!
It would be rad to publish an interactive Jupyter notebook that does some stuff with the API.