juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

embed sixified jsondate into jurisraper/lib

Open umeboshi2 opened this issue 5 years ago • 14 comments

In another discussion, I took note that jsondate wasn't compatible with py3.

I decided to fork the repo and fix it, but since the package consists of a single small __init__.py file and a single test file, I decided it would be just as easy to embed it into the project. Porting the package to use six was pretty straightforward, and it seems that the upstream repo is effectively dead.

It seems that jsondate is only being used i pacerdocket and tests/test_pacer, so I don't think that this code will be too difficult to maintain in the future. The test module adds 9 tests, which are the same tests as upstream, though slightly modified to use six for future capability. The tests should execute at the very end of the test run.

umeboshi2 avatar Jan 04 '19 11:01 umeboshi2

Neat! I just wrote this issue: https://github.com/rconradharris/jsondate/issues/8 in hopes that they'll transfer the code to us, but I'm doubtful they get Github notifications. Let's give it a few days.

If they don't respond, I think I'd rather fork jsondate and release freelawproject/jsondate2 or something instead of vendoring it. Seems better for the community and for separation of concerns.

mlissner avatar Jan 04 '19 17:01 mlissner

Good luck getting a response! There seems to be no response to any issue or pr in the repo. The repo is four commits deep, with the latest being a few years ago. If it comes to a fork, you can use the one I made yesterday, https://github.com/umeboshi2/jsondate. I agree that forking is better than embedding. I just did this since the package is so small and simple, and upstream is very, very quiet.

umeboshi2 avatar Jan 04 '19 19:01 umeboshi2

looks like this fork has six implementation and this fork has py3 support

While I commend to effor to fork yourself @umeboshi2, I'm wondering if an institutional fork might be more robust. If, for example, you ever go silent on GH, then we are in the same position with your fork. But, if the fork is owned by an institution that we know will remain active, and which has multiple developers, then I think there's less likelihood of the project being abandoned again--and I think FLP checks both of these boxes.

arderyp avatar Jan 04 '19 20:01 arderyp

Yes, I created the fork because I only saw nitz's fork on pypi, but that fork is not py2 compatible. Ilya's fork seems to be better since it uses six and should work for each python version. However, with respect to an institutional fork, it seems that rharris is the better option, since he's listed as part of HP. Nevertheless, on the overview pages, it seems that Ilya is the most active on github of all of these candidates. Interestingly, it seems that the amount of ascii characters in just this conversation exceeds the amount of characters in the actual code being discussed.

umeboshi2 avatar Jan 04 '19 21:01 umeboshi2

"Interestingly, it seems that the amount of ascii characters in just this conversation exceeds the amount of characters in the actual code being discussed."

This is a hilarious observation

arderyp avatar Jan 04 '19 23:01 arderyp

Feel glad that we aren't using nodejs, where many important packages are essentially one-liners. :)

Example: https://github.com/then/is-promise

umeboshi@bard:~/tmp/is-promise$ wc  *
  13   32  301 component.json
   5   24  165 index.js
  18  167 1058 LICENSE
  22   47  453 package.json
  32   55  829 readme.md
  56  162 1483 test.js
 146  487 4289 total

umeboshi2 avatar Jan 05 '19 00:01 umeboshi2

I think we have a decision from email to create freelawproject/jsondate3 or 6 or something. @umeboshi2 can you tell me what repo to make? I'll give you permissions to set it up, and then we can do the pypi rigamarole.

mlissner avatar Jan 08 '19 17:01 mlissner

I sent a PR to nitz at nitz14/jsondate#1 late last night. He's already uploading to pypi using "jsondate3". If he accepts the PR and uploads a new version, the problem will be solved. The only change I made between his fork and my fork, excepting the javascript date handling, is to replade u'' with six.text_type(), since I don't know the py3 versions he may have to support.

If he doesn't respond, a new repo can be made under freelaw and filled with my fork. My fork didn't include the javascript format feature. If the new repo is to upload to pypi, we can't use "jsondate3", though "jsondate6" should be available.

I'm not sure how you want to do the pypi rigamarole, but I have been using the travis deploy feature. I started using the audreyr/cookiecutter-pypackage repo to start new python packages, which makes pypi uploads a bit easier. I actually use my own fork for yaml config, instead of json config.

umeboshi2 avatar Jan 08 '19 20:01 umeboshi2

Oops, I forgot to mention that it would be nice to wait a little bit for nitz to see if he's agreeable before adding yet another "jsondate" package to pypi.

umeboshi2 avatar Jan 08 '19 20:01 umeboshi2

Well, nitz does not look like a very active Github user, but we can hope.

mlissner avatar Jan 08 '19 20:01 mlissner

True, but I feel we can wait a little bit. The jsondate package isn't really holding things back from juriscraper being py3 capable. There are also json comparison issues when testing in py3 that are more important. Since that "futureproofing" PR has been merged, I was hoping one of y'all would run the tests in py3 and provide some suggestions on what to do about the json compare problems. The jsondate problem will be easy to solve, regardless of which direction we take, but even so, the json compare problem will still persist.

umeboshi2 avatar Jan 08 '19 21:01 umeboshi2

Also forgot to mention that opinion scrapers don't load when testing in py3, but produce a failure. I submitted the "futureproofing" PR because the code changes were needed, and the tests passed in py2, but more work will need to be done to get juriscraper working with py3. It's been a while since I ran the tests in py3, so I forgot the details. I ran them again a minute ago, so here are some snippets of the results:

test_parsing_rss_parsing (tests.test_pacer.PacerRssFeedTest) ... /freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/juriscraper/lib/html_utils.py:73: DeprecationWarning: The unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.
  return h.unescape(s)
FAIL
======================================================================
FAIL: test_scrape_all_example_files (tests.test_everything.ScraperExampleTest)
Finds all the $module_example* files and tests them with the sample
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 109, in __call__
    function, args, kwargs
  File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 124, in _execute_function
    return self._handle_function(fn=handle_function)
  File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 148, in _handle_function
    return fn(cassette)
  File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 117, in handle_function
    return function(*args, **kwargs)
  File "/freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/test_everything.py", line 201, in test_scrape_all_example_files
    json_data[i],
AssertionError: {'download_urls': 'tests/examples/opinions/u[329 chars]alse} != {'date_filed_is_approximate': False, 'case_d[331 chars]alse}
  {'blocked_statuses': False,
   'case_dates': '2014-08-13',
   'case_names': 'Bizhan Niazi Logistic Services Company',
   'date_filed_is_approximate': False,
   'docket_numbers': 'ASBCA No. 59205',
   'download_urls': 'tests/examples/opinions/united_states/2014/59205%20Bizhan%20Niazi%20Logistic%20Services%20Company%208.13.14.pdf',
-  'judges': "O'Sullivan",
+  'judges': "O\\'Sullivan",
?              ++

   'precedential_statuses': 'Published'}

======================================================================
FAIL: test_parsing_rss_parsing (tests.test_pacer.PacerRssFeedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/test_pacer.py", line 645, in test_parsing_rss_parsing
    self.parse_files(path_root, '*.xml', PacerRssFeed)
  File "/freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/test_pacer.py", line 566, in parse_files
    self.assertEqual(j, data)
AssertionError: Lists differ: [{'da[38 chars]ne, 'jurisdiction': '', 'date_converted': None[1047 chars]12'}] != [{'da[38 chars]ne, 'docket_entries': [{'date_filed': datetime[1037 chars]12'}]

First differing element 1:
{'dat[37 chars]ne, 'jurisdiction': '', 'date_converted': None[467 chars]212'}
{'dat[37 chars]ne, 'docket_entries': [{'date_filed': datetime[457 chars]212'}

  [{'assigned_to_str': '',
    'case_name': 'Dozier v. CN of Dundalk, Inc. d/b/a CNAC of D',
    'cause': '',
    'court_id': 'mdb',
    'date_converted': None,
    'date_discharged': None,
    'date_filed': None,
    'date_terminated': None,
    'demand': '',
    'docket_entries': [{'date_filed': datetime.date(2018, 4, 24),
                        'description': '',
                        'document_number': None,
                        'pacer_doc_id': '',
                        'pacer_seq_no': None,
                        'short_description': 'Close Adversary Case'}],
    'docket_number': '18-00079',
    'jurisdiction': '',
    'jury_demand': '',
    'nature_of_suit': '',
    'pacer_case_id': '711557',
    'parties': None,
    'referred_to_str': ''},
   {'assigned_to_str': '',
    'case_name': 'Unknown Case Title',
    'cause': '',
    'court_id': 'mdb',
    'date_converted': None,
    'date_discharged': None,
    'date_filed': None,
    'date_terminated': None,
    'demand': '',
    'docket_entries': [{'date_filed': datetime.date(2018, 4, 24),
                        'description': '',
-                       'document_number': '1',
?                                          ^^^

+                       'document_number': None,
?                                          ^^^^

-                       'pacer_doc_id': '092041355320',
?                                        ------------

+                       'pacer_doc_id': '',
-                       'pacer_seq_no': '2',
?                                       ^^^

+                       'pacer_seq_no': None,
?                                       ^^^^

                        'short_description': 'Miscellaneous Motion'}],
    'docket_number': '18-90003',
    'jurisdiction': '',
    'jury_demand': '',
    'nature_of_suit': '',
    'pacer_case_id': '713212',
    'parties': None,
    'referred_to_str': ''}]

----------------------------------------------------------------------
Ran 70 tests in 188.509s

FAILED (failures=2, skipped=13)
Test failed: <unittest.runner.TextTestResult run=70 errors=0 failures=2>
0. Doing /freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/examples/pacer/rss_feeds/mdb_1.xml   error: Test failed: <unittest.runner.TextTestResult run=70 errors=0 failures=2>

umeboshi2 avatar Jan 08 '19 21:01 umeboshi2

Small note here that I forked and released a new tool called jsondate3-aware: https://pypi.org/project/jsondate3-aware/. I'll land it as part of Juriscraper in a bit, but it should help here once it's ready.

mlissner avatar May 18 '20 23:05 mlissner

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Dec 18 '21 00:12 CLAassistant