Get citations using scrapers
I'm going to workshop my thoughts on prioritization here - and welcome feedback and thoughts.
@grossir can you please add your suggestion for using back scrapers to collect citations or other material posted later.
Sure @flooie
This would work for sources that have
- Have a "citation" column on their HTML pages
- The court leaves it as a placeholder for some time, until it populates it
An example is md, compare this 2 images from 2023 and 2024 (the current year), where citations are not populated yet
The approach is to run the backscraper with a custom caller. Here is some pseudocode
from juriscraper.opinions.united_states.state import md as scraper_module
from juriscraper.lib.importer import site_yielder
from cl.search.models import Opinion, OpinionCluster
from cl.scrapers.management.commands.cl_scrape_opinions import make_citation
import logging
logger = logging.getLogger(__name__)
class CitationCollector:
def scrape_citations(self, start_date, end_date):
for site in site_yielder(
scraper_module.Site(
backscrape_start=start_date,
backscrape_end=end_date,
).back_scrape_iterable,
scraper_module,
):
# get case dicts by parsing HTML
site.parse()
court_id = scraper_module.court_id.split("/")[-1].split("_")[0]
for record in site:
citation = record['citations']
if not citation:
continue
# get cluster using download_url or hash of the document
cluster = Opinion.objects.get(download_url=record['download_urls']).cluster
# check if citation exists
if self.citation_exists(citation, cluster):
logger.info("Citation already exists '%s' for cluster %s", record['citations'], cluster.id)
continue
citation = make_citation(citation, cluster, court_id)
citation.save()
def citation_exists(self, citation, cluster):
"""To implement"""
return False
Simple enough. Is it a good idea to analyze this across all states to figure out:
- Which have it
- How delayed each is
- How far back each goes
- How difficult each is to scrape
- ?
Thank you guys.
Also, should we spin this off into it's own ticket and task? My hope was to use this issue to discuss high level architecture of a new Juriscraper system, not features we want to add?
I have a spreadsheet that looked at each state - and where these citations could be pulled from. In many cases the citations appear later on the scrapers and in others there is a second cite that could be scraped. The two probably are lexis or west cites that could be scraped (maybe).
https://docs.google.com/spreadsheets/d/1zYP_4ivL2XQF8mlrgdTmzXB57sTn6UYv8GrrRkq7X5Q/edit?usp=sharing
| STATE CITES | COUNT |
|---|---|
| YES | 27 |
| PROBABLE | 2 |
| UNCLEAR | 6 |
| NO | 16 |
10 with neutral citations
That's not too bad! Let's keep filling this in with info about how far back each goes, and things like that.
Yes but I think many of these links are unrelated to the current scrapers - so it's more of a jumping off point for this .
A draft list that answers the questions, organized by "How difficult each is to scrape" in the sense if we have the scraper already implemented
Which have it? How delayed each is? How difficult each is to scrape?
I haven't checked this, how far back each goes
Sources that publish citations in the same URL we scrape
In other words, we just need to run (or implement) the backscraper with a custom caller
| Source | Time lag until citation is published | Example |
|---|---|---|
md |
1 year | See above, most recent citation is from August 2023 |
scotus_slip |
1 month | Most recent citation is 602 U.S. 406 for 22-976 Garland v. Cargill published on June 14, 2024 |
colo |
3 months | Earliest non neutral citation I could find is 545 P.3d 942 for a decision from 15 April 2024, which we do not have in the CL |
minn |
3 months | Earliest citation I could find is 5 N.W.3d 680 for an opinion from May 1st 2024 (today is August 12th) |
ohio |
6 months | Earliest citation I could find is 175 Ohio St.3d 155 for an opinion from April 4th 2024 (today is Sep 10th) |
texapp |
?? | The citations for texapp are available in the new tex source we are scraping |
haw |
3 months | "156 Haw. 144" From June 30, 2025 (it is September 8th, 2025 at time of writing). Currently we only have up to volume 140 for that reporter |
Sources that have a neutral citation inside the opinion's document, but we didn't extract it
To collect past neutral citations, we would need to run the recently updated scraper with extract_from_text against older Opinion.plain_text already in the DB
| Source | citation extractor implemented in | Status | Date since collected | Citations added |
|---|---|---|---|---|
| vt | PR | Done | 2017-01-01 | 658 |
| wis | Sep 3rd, 2024. PR | Done | 2020-01-01 | 386 |
| wisctapp | ... | Done | 2020-01-01 | 0 |
| pasuperct | PR | Pending | ? | ? |
| or and orctapp | TBD | Done | ? | 2417 for orctapp, 171 for or |
Sources that publish an updated document version with the citation
| Source | citation extractor implemented in | Status | Lag until document update | Last citation from reporter in CL |
|---|---|---|---|---|
| ga | Pending | Pending | End of year? In March 2025, documents from late December 2024 have a "FINAL COPY" version | June 28th, 2019, 306 Ga. 351 |
| nm | Done | Versioning problem | End of year? As of April 2 2025, the latest citation in the source is 2025-NMSC-009 - 12/06/2024 | 2021 NMSC 008 |
| neb | Pending | Pending | 5 years or more. As of Aug 2025, most recent citation is from 01/21/2020. Documents will be tagged as "Certified" instead of "Advance when they contain a regional citation. See with missing citation 938 N.W.2d 378 |
Sources that need a backscraper for a different URL than we scrape
In other words, the backscraper may need to go into the united_states_backscrapers, if not a different category folder
| Source | Time lag until citation is published | Example | Modification required |
|---|---|---|---|
okla |
2 months | Most recent citation 549 P.3d 1260 for case published in 05/21/2024, KNOX v. OKLAHOMA GAS AND ELECTRIC CO. We don't have the citation in CL |
We just changed the target URL, but we have code in the Git history to scrape and parse the site where citations are published. |
conn |
1.5 months | Most recent citation 349 Conn. 417 for case published in 06/25/2024 |
We would have to scrape a different page, and extract the data from PDFs, but they are nicely separated, 1 link per each opinion back to volume 326 from 2017. Before, back to volume 320, is a single PDF for all opinions |
Thanks @grossir. Should we rename this issue to be about capturing citations, and make a new one to talk about Juriscraper 3.0 architecture?
Happy to report that the citation backscraper is working, just ran it in prod on md and will soon run it with scotus_slip.
Added 305 citations by running
manage.py cl_back_scrape_citations --courts juriscraper.opinions.united_states.state.md --backscrape-start=2019 --backscrape-end=2023 --verbosity 3
Also added 89 opinions, some of which may be opinions we already had, for which the hash has changed due to corrections
We ran this for scotus_slip, only term 22, and duplicated all records from that term. If the duplications are not too big of a problem, we could run it for all of scotus_slip and get all the citations that we are missing
Anyway, it would be very nice to address the duplication problem https://github.com/freelawproject/courtlistener/issues/3803
The command:
manage.py cl_back_scrape_citations --courts juriscraper.opinions.united_states.federal_appellate.scotus_slip --backscrape-start=2023/01/01 --backscrape-end=2023/06/01 --verbosity 3
Yikes, those duplicates aren't great, no. Let's clean that up somehow, and figure out how to avoid dups before we have 20M opinions. :)
For sources where the citations are inside the document's text, but we just recently implemented extract_from_text to get them, we can run a script like the following (currently, we can do this over vt, wis and wisctapp)
from juriscraper.opinions.united_states.state.vt import Site
from cl.search.models import OpinionCluster, Citation
from django.db import transaction
import traceback
"""
Tested with the following clusters:
Already has a neutral citation in the system
python manage.py clone_from_cl --type search.OpinionCluster --id 4335586
Recent document, Doesn't have a neutral citation in the system
python manage.py clone_from_cl --type search.OpinionCluster --id 10099996
Is an order, doesn't have a neutral citation
python manage.py clone_from_cl --type search.OpinionCluster --id 10044928
Old document (2017), doesn't have a neutral citation in the system
python manage.py clone_from_cl --type search.OpinionCluster --id 4489376
"""
site = Site()
# according to the citations search page,
# latest VT neutral citations we have are from 2015
# https://www.courtlistener.com/c/vt/
# However, we can find neutral citations from 2017?
# https://www.courtlistener.com/opinion/4335586/representative-donald-turner-jr-and-senator-joseph-benning-v-governor/
query = """
SELECT *
FROM search_opinioncluster
WHERE
docket_id IN (SELECT id FROM search_docket sd WHERE court_id = 'vt')
AND
id NOT IN (
SELECT cluster_id
FROM search_citation
WHERE reporter = 'VT'
)
AND
precedential_status = 'Published'
AND
date_filed > '2018-01-01'::date
"""
# This query selects all 'vt' opinion clusters created from 2018 or later
# which do not have a "VT" reporter neutral citation
# It queries over indexes
success, failure, iterated = 0, 0, 0
queryset = OpinionCluster.objects.raw(query).prefetch_related('sub_opinions')
for cluster in queryset:
iterated += 1
for opinion in cluster.sub_opinions.all():
metadata = site.extract_from_text(opinion.plain_text)
if not metadata:
continue
citation_kwargs = metadata['Citation']
citation_kwargs = cluster.id
try:
with transaction.atomic():
Citation.objects.create(**citation_kwargs)
print(f"Created citation {citation_kwargs}")
success += 1
except Exception:
print(f"Failed creating citation for {citation_kwargs}")
print(traceback.format_exc())
failure += 1
print(f"Created {success}\nFailed {failure}\nIterated {iterated}")
I've noticed two citation gaps in Ohio, both documented in courtlistener issue #3882.
- Missing neutral citations in unpublished cases. I think this perhaps happened because Ohio added neutral citations at some point in time that may have been after we scraped. It's possible other states have done this. I haven't systematically tested the extent of this, but I think there are lots of these.
- Missing neutral citations in published cases. Again, some webcites have been added retroactively, so if we got print cases from Harvard (especially 1990s and early 2000s), we may not have the neutral citation parallel cite.
Both of these issues have increased urgency because, as I note in that issue, Ohio Supreme Court has changed style rules to only require neutral citations when they are available, so we're going to start to see a lot of new published opinions that only refer to prior cases by neutral citation.
Just ran the command to get md lagged citations. Got
- 56 citations added to an existing cluster
- 14 citations added to a new cluster, meaning we got 2 versions to merge for a single opinion
- deleted 7 hash duplicates that were causing the command to error; I did it using the admin and making sure no undesired cascades happened
./manage.py cl_back_scrape_citations --courts juriscraper.opinions.united_states.state.md --backscrape-start=2024 --backscrape-end=2024 --verbosity 3
INFO Starting up the scraper.
INFO Using court_str: "md"
INFO Now downloading case page at: https://www.mdcourts.gov/cgi-bin/indexlist.pl?court=coa&year=2024&order=bydate&submit=Submit
INFO juriscraper.opinions.united_states.state.md: Successfully found 92 items.
DEBUG No citation, skipping row for case Willey v. Brown
DEBUG No citation, skipping row for case Hollins v. State
DEBUG No citation, skipping row for case Attorney Grievance Comm'n v. Franklin
DEBUG No citation, skipping row for case Scott v. Hon. Bowman
DEBUG No citation, skipping row for case Reinstatement of Kirwan
DEBUG No citation, skipping row for case Reinstatement of Assaraf
DEBUG No citation, skipping row for case Attorney Grievance Comm'n v. Loots
DEBUG No citation, skipping row for case Attorney Grievance Comm'n v. Gormley
DEBUG No citation, skipping row for case Feng v. Chen
DEBUG No citation, skipping row for case Attorney Grievance Comm'n v. Elan
DEBUG No citation, skipping row for case Attorney Grievance Comm'n v. Yeatman
DEBUG No citation, skipping row for case State Bd. of Elections v. Ambridge
DEBUG No citation, skipping row for case State v. Scarboro
DEBUG No citation, skipping row for case Attorney Grievance Comm'n v. Mahoney
DEBUG No citation, skipping row for case Reinstatement of Tabe
DEBUG No citation, skipping row for case Reinstatement of Gordon
DEBUG No citation, skipping row for case Attorney Grievance Comm'n v. O'Neill
DEBUG No citation, skipping row for case Attorney Grievance Comm'n v. Mayers
DEBUG No citation, skipping row for case Attorney Grievance Comm'n v. Gallagher
DEBUG No citation, skipping row for case Greenmark Properties v. Parts, Inc.
INFO Case 'Syed v. Lee', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/7a23.pdf' has no matching hash in the DB. Has a citation '488 Md. 537'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/7a23.pdf'
INFO Successfully added opinion 11080272: b'Syed v. Lee'
DEBUG No citation, skipping row for case Bethesda African Cemetery Coal. v. Housing Opp. Comm.
INFO Case 'State v. Thomas', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/15a23.pdf' has no matching hash in the DB. Has a citation '488 Md. 456'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/15a23.pdf'
WARNING , Retrying in 5 seconds...
INFO Successfully added opinion 11080273: b'State v. Thomas'
INFO Saved citation 488 Md. 534 for cluster 10098671: Frederick v. Baltimore City BOE
INFO Saved citation 488 Md. 531 for cluster 10098672: Balt. City BOE v. Mayor & City Cncl. of Balt
INFO Saved citation 488 Md. 454 for cluster 10078915: Attorney Grievance Comm'n v. O'Neill
INFO Saved citation 488 Md. 455 for cluster 10079360: Attorney Grievance Comm'n v. Koh
INFO Saved citation 488 Md. 410 for cluster 10079814: Adventist Healthcare v. Behram
INFO Saved citation 488 Md. 326 for cluster 10046356: In the Matter of McCloy
INFO Saved citation 488 Md. 354 for cluster 10046293: Cook v. State
INFO Saved citation 488 Md. 384 for cluster 10046476: Attorney Grievance Comm'n v. Goldscher
INFO Case 'Turenne v. State', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/20a23.pdf' has no matching hash in the DB. Has a citation '488 Md. 239'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/20a23.pdf'
INFO Successfully added opinion 11080278: b'Turenne v. State'
INFO Case 'Rovin v. State', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/19a23.pdf' has no matching hash in the DB. Has a citation '488 Md. 144'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/19a23.pdf'
INFO Successfully added opinion 11080279: b'Rovin v. State'
INFO Saved citation 488 Md. 45 for cluster 10041667: In the Matter of Hon. Ademiluyi
INFO Case 'Mitchell v. State', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/8a23.pdf' has no matching hash in the DB. Has a citation '488 Md. 1'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/8a23.pdf'
INFO Successfully added opinion 11080280: b'Mitchell v. State'
INFO Case 'State v. Smith', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/30a23.pdf' has no matching hash in the DB. Has a citation '487 Md. 635'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/30a23.pdf'
INFO Successfully added opinion 11080281: b'State v. Smith'
INFO Saved citation 487 Md. 701 for cluster 10039238: Mooney v. State
INFO Saved citation 487 Md. 632 for cluster 10039604: Katz, Abosch, etc., P.A. v. Parkway Neuroscience
INFO Case 'Jarvis v. State', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/22a23.pdf' has no matching hash in the DB. Has a citation '487 Md. 548'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/22a23.pdf'
INFO Successfully added opinion 11080282: b'Jarvis v. State'
INFO Saved citation 487 Md. 487 for cluster 10038343: Bennett v. Gentile
INFO Saved citation 487 Md. 501 for cluster 10027703: Attorney Grievance Comm'n v. Whitted
INFO Saved citation 487 Md. 476 for cluster 10025931: Doctor's Weight Loss Ctrs. v. Blackston
INFO Saved citation 487 Md. 474 for cluster 10020257: Attorney Grievance Comm'n v. Glenn
INFO Saved citation 487 Md. 455 for cluster 10013950: Attorney Grievance Comm'n v. Waldeck
INFO Case 'Attorney Grievance Comm'n v. Hardy', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/4a24ag.pdf' has no matching hash in the DB. Has a citation '487 Md. 456'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/4a24ag.pdf'
INFO Successfully added opinion 11080283: b"Attorney Grievance Comm'n v. Hardy"
INFO Saved citation 487 Md. 454 for cluster 10013355: Application of Lenk to Resign from Bar
INFO Case 'Freeman v. State', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/24a23.pdf' has no matching hash in the DB. Has a citation '487 Md. 420'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/24a23.pdf'
INFO Successfully added opinion 11080284: b'Freeman v. State'
INFO Saved citation 487 Md. 385 for cluster 10010704: Lithko Contracting v. XL Insurance Amer.
INFO Saved citation 487 Md. 354 for cluster 10010705: Town of Bel Air v. Bodt
INFO Saved citation 487 Md. 383 for cluster 9998430: Reinstatement of Tauber
INFO Saved citation 487 Md. 382 for cluster 9998460: Attorney Grievance Comm'n v. Gallagher
INFO Saved citation 487 Md. 384 for cluster 9998514: Attorney Grievance Comm'n v. Davis
DEBUG No citation, skipping row for case Attorney Grievance Comm'n v. Mosby
INFO Case 'Cunningham ex rel Gaines v. Baltimore Cnty.', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/9a23.pdf' has no matching hash in the DB. Has a citation '487 Md. 282'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/9a23.pdf'
INFO Successfully added opinion 11080285: b'Cunningham ex rel Gaines v. Baltimore Cnty.'
INFO Saved citation 487 Md. 260 for cluster 9567459: Attorney Grievance Comm'n v. Lamm
INFO Saved citation 487 Md. 260 for cluster 9834681: Attorney Grievance Comm'n v. Baker
INFO Saved citation 497 Md. 258 for cluster 9509513: Resper v. Dept. of Pub. Saf. & Corr. Servs.
INFO Saved citation 487 Md. 254 for cluster 9509248: Reinstatement of Ibebuchi
INFO Saved citation 487 Md. 256 for cluster 9509249: Attorney Grievance Comm'n v. Teitelbaum
INFO Saved citation 487 Md. 255 for cluster 9509352: Attorney Grievance Comm'n v. Tappan
INFO Saved citation 487 Md. 257 for cluster 9509300: Attorney Grievance Comm'n v. Nelson
INFO Saved citation 487 Md. 214 for cluster 9509127: Walker v. State
INFO Saved citation 487 Md. 216 for cluster 9508885: Mason v. State
INFO Case 'Gonzalez v. State', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/23a23.pdf' has no matching hash in the DB. Has a citation '487 Md. 136'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/23a23.pdf'
INFO Successfully added opinion 11080286: b'Gonzalez v. State'
INFO Saved citation 487 Md. 133 for cluster 9508757: In the Matter of Hon. Ademiluyi
INFO Saved citation 487 Md. 53 for cluster 9495884: In Re: M.P.
INFO Saved citation 487 Md. 52 for cluster 9495373: Reinstatement of Jeffrey to the Bar of Md.
INFO Saved citation 487 Md. 52 for cluster 9495372: Reinstatement of Moody to the Bar of Md.
INFO Case 'Riley v. Venice Beach Citizens Ass'n', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/5a23.pdf' has no matching hash in the DB. Has a citation '487 Md. 1'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/5a23.pdf'
INFO Successfully added opinion 11080287: b"Riley v. Venice Beach Citizens Ass'n"
INFO Saved citation 486 Md. 616 for cluster 9487483: Westminster Management v. Smith
INFO Saved citation 486 Md. 613 for cluster 9487573: Resper v. Dept. of Pub. Saf. & Corr. Servs.
INFO Saved citation 486 Md. 683 for cluster 9487484: Matthews v. State
INFO Saved citation 486 Md. 596 for cluster 9486688: Attorney Grievance Comm'n v. Moir
INFO Case 'Attorney Grievance Comm'n v. Kurtyka', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/44a23ag.pdf' has no matching hash in the DB. Has a citation '486 Md. 594'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/44a23ag.pdf'
INFO Successfully added opinion 11080288: b"Attorney Grievance Comm'n v. Kurtyka"
INFO Saved citation 486 Md. 593 for cluster 9485836: Attorney Grievance Comm'n v. Goldstein
INFO Saved citation 486 Md. 501 for cluster 9481791: Resignation of King Jr.
INFO Saved citation 486 Md. 496 for cluster 9479195: Harvey v. DeMarinis
INFO Saved citation 486 Md. 454 for cluster 9478896: Attorney Grievance Comm'n v. Donnelly
INFO Saved citation 486 Md. 408 for cluster 9486514: Petition of the Off. Of People's Counsel
INFO Saved citation 486 Md. 502 for cluster 9477504: In the Matter of SmartEnergy
INFO Saved citation 486 Md. 407 for cluster 9477505: Attorney Grievance Comm'n v. Anderson
INFO Saved citation 486 Md. 386 for cluster 9477329: Attorney Grievance Comm'n v. Weinberg
INFO Saved citation 486 Md. 385 for cluster 9477330: Attorney Grievance Comm'n v. Johnson
INFO Saved citation 486 Md. 384 for cluster 9477331: Attorney Grievance Comm'n v. Chang
INFO Saved citation 486 Md. 383 for cluster 9477332: Application of Sausser to Resign
INFO Saved citation 486 Md. 382 for cluster 9477333: Application of Patterson to Resign
INFO Case 'Motor Vehicle Admin. v. Usan', opinion 'https://www.mdcourts.gov/data/opinions/coa/2024/6a23.pdf' has no matching hash in the DB. Has a citation '486 Md. 352'. Will try to ingest all objects
INFO Adding new document found at: b'https://www.mdcourts.gov/data/opinions/coa/2024/6a23.pdf'
INFO Successfully added opinion 11080289: b'Motor Vehicle Admin. v. Usan'
INFO Saved citation 486 Md. 338 for cluster 9477335: Reinstatement of Sloane to the Bar of Md.
INFO Saved citation 486 Md. 338 for cluster 9477336: Reinstatement of Kilroy to the Bar of Md.
INFO Saved citation 486 Md. 340 for cluster 9477337: Attorney Grievance Comm'n v. Johnson
INFO Saved citation 486 Md. 641 for cluster 9477338: Attorney Grievance Comm'n v. Buie
INFO Saved citation 486 Md. 339 for cluster 9477339: Attorney Grievance Comm'n v. Bobotek
After ga versioning was mostly solved, I ran the backscraper and got 788 new Ga. citations, for years 2022 to 2025, with at most 55 versioning failures (meaning, the opinion containing the citation was ingested and the previous version couldn't be linked)
Details
./manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.ga --backscrape-start=2022 --backscrape-end=2026 --verbosity 3
courtlistener=> select count(*) from search_citation where reporter = 'Ga.' and date_created::date = '2025-09-30'::date;
count
-------
788
(1 row)
courtlistener=> select count(*), sum((main_version_id is not null)::int), sum((main_version_id is not null)::int)*2 from search_opinion where cluster_id in (select cluster_id from search_citation where reporter = 'Ga.' and date_created::date = '2025-09-30'::date);
count | sum | ?column?
-------+-----+----------
1521 | 733 | 1466
Ran the command for haw and hawapp. Got 1683 "Haw." citations
./manage.py cl_back_scrape_citations --courts juriscraper.opinions.united_states.state.haw --backscrape-start=2018/01/01 --backscrape-end=2025/01/01 --verbosity 3 --backscrape-wait=10
./manage.py cl_back_scrape_citations --courts juriscraper.opinions.united_states.state.hawapp --backscrape-start=2018/01/01 --backscrape-end=2025/01/01 --verbosity 3 --backscrape-wait=10
courtlistener=> select count(*) from search_citation where reporter = 'Haw.' and date_created::date = '2025-10-29'::date;
count
-------
1683
(1 row)
very nice