juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

Bunch of DeprecationWarnings in Python3 due to invalid escape sequences

Open voutilad opened this issue 7 years ago • 4 comments

Some scrapers still have potentially issue-prone regex patterns that could be an issue in Py3.7+. Guess I didn't catch these before.

Simple fix is to set these string literals to raw string literals.

Finds all the $module_example* files and tests them with the sample ... /Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/federal_appellate/ca8.py:23: DeprecationWarning: invalid escape sequence \d
  case_name_regex = re.compile('(\d{2}/\d{2}/\d{4})(.*)')
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/federal_appellate/ca8.py:33: DeprecationWarning: invalid escape sequence \d
  case_date_regex = re.compile('(\d{2}/\d{2}/\d{4})(.*)')
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/federal_appellate/ca8.py:41: DeprecationWarning: invalid escape sequence \d
  docket_number_regex = re.compile('(\d{2})(\d{4})(u|p)', re.IGNORECASE)
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/federal_district/dcd.py:81: DeprecationWarning: invalid escape sequence \?
  regex = re.compile('(\?)(\d+)([a-z]+)(\d+)(-)(.*)')
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/federal_district/dcd.py:101: DeprecationWarning: invalid escape sequence \s
  judge = re.search('(by\s)(.*)', judge_string, re.MULTILINE).group(2)
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/federal_district/dcd.py:113: DeprecationWarning: invalid escape sequence \?
  regex = '(\?)(\d+)([a-z]+)(\d+)(\-)(.*)'
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/federal_special/acca_p.py:21: DeprecationWarning: invalid escape sequence \d
  self.docket_case_name_splitter = re.compile('(.*[\dX]{5,8})(.*)')
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/fla.py:22: DeprecationWarning: invalid escape sequence \d
  self.regex = re.compile("(S?C\d+-\d+)(.*)")
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/fladistctapp_3.py:72: DeprecationWarning: invalid escape sequence \d
  text = re.search('(\d{2}-\d{2}-\d{4})', text).group(1)
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/fladistctapp_5.py:31: DeprecationWarning: invalid escape sequence \d
  self.case_regex = '(5D.*-.*\d{1,3})([- ]+[A-Za-z].*)'
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/miss.py:38: DeprecationWarning: invalid escape sequence \d
  date_re = re.compile('(\d{2}-\d{2}-\d{4})')
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/nc.py:37: DeprecationWarning: invalid escape sequence \d
  date_cleaner = "\d+ \w+ [12][90]\d\d"
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/nc.py:105: DeprecationWarning: invalid escape sequence \(
  download_url = re.search('viewopinion\("(.*)"',
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/nc.py:71: DeprecationWarning: invalid escape sequence \(
  'viewopinion\("(.*)"',
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/nc.py:130: DeprecationWarning: invalid escape sequence \d
  docket_number = re.search('(.*\d).*?',
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/nc.py:135: DeprecationWarning: invalid escape sequence \d
  if not re.search('^\d\d.*\d\d$', neutral_cite):
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/nd.py:47: DeprecationWarning: invalid escape sequence \d
  citation_pattern = '^.{0,5}(\d{4} ND (?:App )?\d{1,4})'
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/or.py:29: DeprecationWarning: invalid escape sequence \d
  docket_numbers.append(' & '.join(re.findall('S\d+', s)))
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/pacommwct.py:24: DeprecationWarning: invalid escape sequence \s
  self.set_regex("(.*)(?:- |et al.\s+)(\d+.*\d{4})")
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/ri_p.py:82: DeprecationWarning: invalid escape sequence \(
  regex = '(.*?)(\((\w+\s+\d+\,\s+\d+)\))(.*?)'
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/ri_p.py:101: DeprecationWarning: invalid escape sequence \s
  '(.*?)(,?\sNos?\.)(.*?)',
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/ri_p.py:103: DeprecationWarning: invalid escape sequence \s
  '(.*?)(,?\s\d+-\d+(,|\s))(.*?)',
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/ri_p.py:106: DeprecationWarning: invalid escape sequence \s
  '(.*?)(,?\s(?:\w+-)?\d+-\d+(,|\s))(.*?)',
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/sd.py:46: DeprecationWarning: invalid escape sequence \d
  case_name = re.search('(.*)(\d{4} S\.?D\.? \d{1,4})', s, re.MULTILINE).group(1)
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states/state/sd.py:62: DeprecationWarning: invalid escape sequence \d
  neutral_cite = re.search('(.*)(\d{4} S\.?D\.? \d{1,4})', s, re.MULTILINE).group(2)
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states_backscrapers/federal_district/dcd_2013.py:101: DeprecationWarning: invalid escape sequence \s
  judge = re.search('(by\s)(.*)', judge_string, re.MULTILINE).group(2)
/Users/dave/src/freelawproject/juriscraper/juriscraper/opinions/united_states_backscrapers/federal_district/dcd_2013.py:113: DeprecationWarning: invalid escape sequence \?
  regex = '(\?)(\d+)([a-z]+)(\d+)(\-)(.*)'
/Users/dave/src/freelawproject/juriscraper/juriscraper/oral_args/united_states/federal_appellate/ca3.py:20: DeprecationWarning: invalid escape sequence \d
  self.regex = '(\d{2}-\d{3,4})?(.+)\.(:?(wma)|(mp3))'

voutilad avatar Feb 17 '17 14:02 voutilad

There are a lot more problems than just this when trying to run Juriscraper in Python 3. Tried running:

python3 setup.py test

Are python2 and python3 compatibility desired?

janderse avatar Sep 06 '18 22:09 janderse

Py3 is desired in the broad sense, but "nobody" is asking for it yet. Until CourtListener itself is Py3 ready, doing Juriscraper is good, but not a huge thing. IIRC, we turned off Travis testing for py3 a while back and with a lot of sadness.

All of that said, I'm totally in favor of and enthusiastic about Py3 compatibility, especially if, like here, it sounds fairly easy.

mlissner avatar Sep 07 '18 00:09 mlissner

in https://github.com/freelawproject/juriscraper/commit/c7b6fef5b9e7177481542be7651ac35aa5571aa3 @mlissner cited requests-mock (which now seems to be python3 over at https://github.com/jamielennox/requests-mock) and jsondate, which doesn't seem likely to get updated on its own (https://github.com/rconradharris/jsondate/issues/7)

jcrben avatar Oct 03 '18 02:10 jcrben

If jsondate is the only issue, I wonder if the easiest path here is either:

  • Forking and hosting our own version of jsondate (as sometimes happens when py2 stuff is abandoned), or

  • Dropping our dependency on it and figuring out a different way forward.

I think either should be fairly simple. I don't think we do a ton with jsondate.

Thanks for doing the digging on this, @jcrben.

mlissner avatar Oct 03 '18 16:10 mlissner