juriscraper
juriscraper copied to clipboard
embed sixified jsondate into jurisraper/lib
In another discussion, I took note that jsondate wasn't compatible with py3.
I decided to fork the repo and fix it, but since the package consists of a single small __init__.py
file and a single test file, I decided it would be just as easy to embed it into the project. Porting the package to use six was pretty straightforward, and it seems that the upstream repo is effectively dead.
It seems that jsondate is only being used i pacerdocket and tests/test_pacer, so I don't think that this code will be too difficult to maintain in the future. The test module adds 9 tests, which are the same tests as upstream, though slightly modified to use six for future capability. The tests should execute at the very end of the test run.
Neat! I just wrote this issue: https://github.com/rconradharris/jsondate/issues/8 in hopes that they'll transfer the code to us, but I'm doubtful they get Github notifications. Let's give it a few days.
If they don't respond, I think I'd rather fork jsondate and release freelawproject/jsondate2 or something instead of vendoring it. Seems better for the community and for separation of concerns.
Good luck getting a response! There seems to be no response to any issue or pr in the repo. The repo is four commits deep, with the latest being a few years ago. If it comes to a fork, you can use the one I made yesterday, https://github.com/umeboshi2/jsondate. I agree that forking is better than embedding. I just did this since the package is so small and simple, and upstream is very, very quiet.
looks like this fork has six implementation and this fork has py3 support
While I commend to effor to fork yourself @umeboshi2, I'm wondering if an institutional fork might be more robust. If, for example, you ever go silent on GH, then we are in the same position with your fork. But, if the fork is owned by an institution that we know will remain active, and which has multiple developers, then I think there's less likelihood of the project being abandoned again--and I think FLP checks both of these boxes.
Yes, I created the fork because I only saw nitz's fork on pypi, but that fork is not py2 compatible. Ilya's fork seems to be better since it uses six and should work for each python version. However, with respect to an institutional fork, it seems that rharris is the better option, since he's listed as part of HP. Nevertheless, on the overview pages, it seems that Ilya is the most active on github of all of these candidates. Interestingly, it seems that the amount of ascii characters in just this conversation exceeds the amount of characters in the actual code being discussed.
"Interestingly, it seems that the amount of ascii characters in just this conversation exceeds the amount of characters in the actual code being discussed."
This is a hilarious observation
Feel glad that we aren't using nodejs, where many important packages are essentially one-liners. :)
Example: https://github.com/then/is-promise
umeboshi@bard:~/tmp/is-promise$ wc *
13 32 301 component.json
5 24 165 index.js
18 167 1058 LICENSE
22 47 453 package.json
32 55 829 readme.md
56 162 1483 test.js
146 487 4289 total
I think we have a decision from email to create freelawproject/jsondate3 or 6 or something. @umeboshi2 can you tell me what repo to make? I'll give you permissions to set it up, and then we can do the pypi rigamarole.
I sent a PR to nitz at nitz14/jsondate#1 late last night. He's already uploading to pypi using "jsondate3". If he accepts the PR and uploads a new version, the problem will be solved. The only change I made between his fork and my fork, excepting the javascript date handling, is to replade u''
with six.text_type()
, since I don't know the py3 versions he may have to support.
If he doesn't respond, a new repo can be made under freelaw and filled with my fork. My fork didn't include the javascript format feature. If the new repo is to upload to pypi, we can't use "jsondate3", though "jsondate6" should be available.
I'm not sure how you want to do the pypi rigamarole, but I have been using the travis deploy feature. I started using the audreyr/cookiecutter-pypackage repo to start new python packages, which makes pypi uploads a bit easier. I actually use my own fork for yaml config, instead of json config.
Oops, I forgot to mention that it would be nice to wait a little bit for nitz to see if he's agreeable before adding yet another "jsondate" package to pypi.
Well, nitz does not look like a very active Github user, but we can hope.
True, but I feel we can wait a little bit. The jsondate package isn't really holding things back from juriscraper being py3 capable. There are also json comparison issues when testing in py3 that are more important. Since that "futureproofing" PR has been merged, I was hoping one of y'all would run the tests in py3 and provide some suggestions on what to do about the json compare problems. The jsondate problem will be easy to solve, regardless of which direction we take, but even so, the json compare problem will still persist.
Also forgot to mention that opinion scrapers don't load when testing in py3, but produce a failure. I submitted the "futureproofing" PR because the code changes were needed, and the tests passed in py2, but more work will need to be done to get juriscraper working with py3. It's been a while since I ran the tests in py3, so I forgot the details. I ran them again a minute ago, so here are some snippets of the results:
test_parsing_rss_parsing (tests.test_pacer.PacerRssFeedTest) ... /freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/juriscraper/lib/html_utils.py:73: DeprecationWarning: The unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.
return h.unescape(s)
FAIL
======================================================================
FAIL: test_scrape_all_example_files (tests.test_everything.ScraperExampleTest)
Finds all the $module_example* files and tests them with the sample
----------------------------------------------------------------------
Traceback (most recent call last):
File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 109, in __call__
function, args, kwargs
File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 124, in _execute_function
return self._handle_function(fn=handle_function)
File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 148, in _handle_function
return fn(cassette)
File "/freespace/home/umeboshi/.virtualenvs/juriscraper/lib/python3.5/site-packages/vcr/cassette.py", line 117, in handle_function
return function(*args, **kwargs)
File "/freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/test_everything.py", line 201, in test_scrape_all_example_files
json_data[i],
AssertionError: {'download_urls': 'tests/examples/opinions/u[329 chars]alse} != {'date_filed_is_approximate': False, 'case_d[331 chars]alse}
{'blocked_statuses': False,
'case_dates': '2014-08-13',
'case_names': 'Bizhan Niazi Logistic Services Company',
'date_filed_is_approximate': False,
'docket_numbers': 'ASBCA No. 59205',
'download_urls': 'tests/examples/opinions/united_states/2014/59205%20Bizhan%20Niazi%20Logistic%20Services%20Company%208.13.14.pdf',
- 'judges': "O'Sullivan",
+ 'judges': "O\\'Sullivan",
? ++
'precedential_statuses': 'Published'}
======================================================================
FAIL: test_parsing_rss_parsing (tests.test_pacer.PacerRssFeedTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/test_pacer.py", line 645, in test_parsing_rss_parsing
self.parse_files(path_root, '*.xml', PacerRssFeed)
File "/freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/test_pacer.py", line 566, in parse_files
self.assertEqual(j, data)
AssertionError: Lists differ: [{'da[38 chars]ne, 'jurisdiction': '', 'date_converted': None[1047 chars]12'}] != [{'da[38 chars]ne, 'docket_entries': [{'date_filed': datetime[1037 chars]12'}]
First differing element 1:
{'dat[37 chars]ne, 'jurisdiction': '', 'date_converted': None[467 chars]212'}
{'dat[37 chars]ne, 'docket_entries': [{'date_filed': datetime[457 chars]212'}
[{'assigned_to_str': '',
'case_name': 'Dozier v. CN of Dundalk, Inc. d/b/a CNAC of D',
'cause': '',
'court_id': 'mdb',
'date_converted': None,
'date_discharged': None,
'date_filed': None,
'date_terminated': None,
'demand': '',
'docket_entries': [{'date_filed': datetime.date(2018, 4, 24),
'description': '',
'document_number': None,
'pacer_doc_id': '',
'pacer_seq_no': None,
'short_description': 'Close Adversary Case'}],
'docket_number': '18-00079',
'jurisdiction': '',
'jury_demand': '',
'nature_of_suit': '',
'pacer_case_id': '711557',
'parties': None,
'referred_to_str': ''},
{'assigned_to_str': '',
'case_name': 'Unknown Case Title',
'cause': '',
'court_id': 'mdb',
'date_converted': None,
'date_discharged': None,
'date_filed': None,
'date_terminated': None,
'demand': '',
'docket_entries': [{'date_filed': datetime.date(2018, 4, 24),
'description': '',
- 'document_number': '1',
? ^^^
+ 'document_number': None,
? ^^^^
- 'pacer_doc_id': '092041355320',
? ------------
+ 'pacer_doc_id': '',
- 'pacer_seq_no': '2',
? ^^^
+ 'pacer_seq_no': None,
? ^^^^
'short_description': 'Miscellaneous Motion'}],
'docket_number': '18-90003',
'jurisdiction': '',
'jury_demand': '',
'nature_of_suit': '',
'pacer_case_id': '713212',
'parties': None,
'referred_to_str': ''}]
----------------------------------------------------------------------
Ran 70 tests in 188.509s
FAILED (failures=2, skipped=13)
Test failed: <unittest.runner.TextTestResult run=70 errors=0 failures=2>
0. Doing /freespace/home/umeboshi/workspace/freelaw/freelawmachine/juriscraper/tests/examples/pacer/rss_feeds/mdb_1.xml error: Test failed: <unittest.runner.TextTestResult run=70 errors=0 failures=2>
Small note here that I forked and released a new tool called jsondate3-aware: https://pypi.org/project/jsondate3-aware/. I'll land it as part of Juriscraper in a bit, but it should help here once it's ready.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.