juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

Fill `pa`, `pasuperct` and `pacommwct`

Open grossir opened this issue 1 year ago • 2 comments

Part of #929

To fill the gaps, we will implement a backscraper

pa

Between May 28, 2021 and November 16, 2021 we have 0 documents. There are 1261 documents in that time period, some of which are not opinions

pasuperct

Between May 28, 2021 and November 17, 2021 we have 0 documents. We are missing around 1504 "Non Precedential" documents and 164 Precedential opinions (source)

pacommwct

Between May 28, 2021 and August 23, 2021 we have 0 documents. We are missing 49 precedential opinionsf

grossir avatar Mar 26 '24 18:03 grossir

I have updated this source to use the API, instead of the RSS Feed, both for the present scraper and for the back scraper

This source returns OpinionClusters from it's API.. It would be an easy candidate to start returning clusters. For example (and more on the example files):

    {
        "Author": null,
        "BoardDocketNumber": null,
        "Caption": "In the Interest of: N.E.M., Appeal of: N.E.M. - No. 9 EAP 2023",
        "CourtDocketNumber": null,
        "CourtType": 3,
        "DispositionDate": "2024-03-21T00:00:00",
        "Keywords": null,
        "UserIdentifier": "E.D. Prothonotary",
        "UploadDate": "0001-01-01T00:00:00",
        "PostedToday": false,
        "Postings": [
            {
                "Id": 88531,
                "AuthorId": "Donohue, Christine",
                "OpinionId": 80242,
                "FileName": "J-41B-2023mo - 105874033259675150.pdf",
                "ProcessedDate": "2024-03-21T00:00:00",
                "PostingTypeId": "mo",
                "PublicationTypeId": null,
                "RenderedDate": "2024-03-21T00:00:00",
                "SortOrder": 0,
                "FileVersion": 1,
                "Author": {
                    "Id": 0,
                    "AuthorName": "Justice Christine Donohue",
                    "AuthorCode": "Donohue, Christine",
                    "Selectable": true,
                    "SortOrder": 1430
                },
                "PostType": {
                    "Id": 0,
                    "PostingTypeCode": "mo",
                    "PostingTypeId": "Majority Opinion",
                    "SortOrder": null
                },
                "PublicationType": null
            },
            {
                "Id": 88533,
                "AuthorId": "Dougherty, Kevin M.",
                "OpinionId": 80242,
                "FileName": "J-41B-2023co - 105874033259675223.pdf",
                "ProcessedDate": "2024-03-21T00:00:00",
                "PostingTypeId": "co",
                "PublicationTypeId": null,
                "RenderedDate": "2024-03-21T00:00:00",
                "SortOrder": 0,
                "FileVersion": 1,
                "Author": {
                    "Id": 0,
                    "AuthorName": "Justice Kevin Dougherty",
                    "AuthorCode": "Dougherty, Kevin M.",
                    "Selectable": true,
                    "SortOrder": 1440
                },
                "PostType": {
                    "Id": 0,
                    "PostingTypeCode": "co",
                    "PostingTypeId": "Concurring Opinion",
                    "SortOrder": null
                },
                "PublicationType": null
            }
        ],
        "Id": 80242
    },

grossir avatar Mar 26 '24 20:03 grossir

Commands to fill the gap

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.pa --backscrape-start=05/27/2021 --backscrape-end=11/17/2021

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.pasuperct --backscrape-start=05/27/2021 --backscrape-end=11/18/2021

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.pacommwct --backscrape-start=05/27/2021 --backscrape-end=08/24/2021

grossir avatar May 08 '24 15:05 grossir