juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

feat(OpinionSite): return "lower_court_id" field

Open grossir opened this issue 9 months ago • 4 comments

Solves #1432

This new field will go into "Docket.appeal_from_id"

Also, make tex scraper return "lower_court_id"

grossir avatar Jun 11 '25 19:06 grossir

I think we need to fix the Texas scraper with respect to IDs before we move this forward, perhaps we should make this a draft

flooie avatar Jun 16 '25 14:06 flooie

@grossir I find this


    @staticmethod
    def parse_lower_court_info(title: str) -> tuple[str, str]:
        """Parses lower court information from the title string

        :param title string
        :return lower_court, lower_court_number
        """

        # format when appeal comes from texapp. Example:
        # ' from Harris County; 1st Court of Appeals District (01-22-00182-CV, 699 SW3d 20, 03-23-23)'
        texapp_regex = r" from (?P<lower_court>.*)\s*\("

        # Examples:
        #  "(U.S. Fifth Circuit 23-10804)"
        #  "(U.S. 5th Circuit 19-51012)"
        # "(BODA Cause No. 67623)"
        other_courts_regex = r"\((?P<lower_court>(BODA|U\.S\. (Fif|5)th Circuit))\s(?P<lower_number>(Cause No. )?[\d-]+)\)$"

        if match := re.search(texapp_regex, title):
            lower_court = match.group("lower_court")
            lower_court_number = title[match.end() :].split(",")[0]
            return lower_court, lower_court_number, "texapp"
        elif match := re.search(other_courts_regex, title):
            lower_court = match.group("lower_court")
            lower_court_number = match.group("lower_number")

            if lower_court == "BODA":
                lower_court = "Board of Disciplinary Appeals"
                lower_court_id = ""
            else:
                # if this is not a BODA match, then it can only be a
                # Fifth Circuit match. Update this if the regex above changes
                lower_court_id = "ca5"

            return lower_court, lower_court_number, lower_court_id
        return "", "", ""

to be problematic. Can we return it to just return the lower court number and extract out the remaining data from extract from text.


    def extract_from_text(self, scraped_text: str) -> dict:
        """"""
        match = re.split(r"═{15,}", scraped_text)
        court_id = ""
        metadata = {"Docket": {}}
        if not match:
            return metadata
        lower_court = match[1].replace("On Petition for Review from the", "").strip()
        if lower_court.startswith("Court of Appeals"):
            court_id = "texapp"
        elif lower_court.startswith("Board of Disciplinary Appeals"):
            court_id = "txboda"
        elif lower_court.startswith("United States Court of Appeals for the Fifth Circuit"):
            court_id = "ca5"
        if court_id != "":
            metadata['Docket']['lower_court_str'] = lower_court
            metadata['Docket']['lower_court_id'] = court_id
        return metadata

I think I like the way the courts names are written here - they match and look much nicer to me.

flooie avatar Jun 16 '25 16:06 flooie

@flooie

I was checking the PDFs and I would keep the data from the HTML source, because it has:

  • lower complexity: for example, on the PDF the separator is not always "On Petition for Review from the", I have also found "On Certified Question from the", and there may be other variations to account for
  • more information: the "lower_court_str" also mentions the county it's coming from; not only the district

About the formatting being prettier or more standard in the PDF, when we implement the frontend we will just use the "appeal_from_id", which links to a Court object which has the standard court name; so I don't think a standard name should matter too much for "lower_court_str" / "appeal_from_str"

grossir avatar Jun 17 '25 15:06 grossir

@grossir I think the HTML is providing a non standard name for the court and I much prefer the format from the PDF.

let me take a look at a bigger sample

flooie avatar Jun 17 '25 16:06 flooie