extruct
extruct copied to clipboard
Try to fix bad JSON due to unescaped double quotes
Fixes #53
If we merge this, we should create a separate issue to handle https://github.com/scrapinghub/extruct/issues/53#issuecomment-389053240, which probably requires a custom fallback JSON parser.
Codecov Report
Merging #126 (96bf6b3) into master (d78167c) will increase coverage by
0.28%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #126 +/- ##
==========================================
+ Coverage 90.24% 90.52% +0.28%
==========================================
Files 13 13
Lines 605 623 +18
Branches 136 137 +1
==========================================
+ Hits 546 564 +18
Misses 52 52
Partials 7 7
Impacted Files | Coverage Δ | |
---|---|---|
extruct/jsonld.py | 100.00% <100.00%> (ø) |
|
extruct/rdfa.py | 100.00% <100.00%> (ø) |
|
extruct/utils.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d78167c...96bf6b3. Read the comment docs.
Just found https://github.com/codecobblers/dirtyjson by fast googling. May be it have sense to use it here.
Using a library would definitely be better.
I’m a bit worried about the library not having received changes since 2017, and @scottkmaxwell not being active in GitHub since 2018.
On the other hand, there are no open issues or pull requests in the repository, and worse case scenario we could fork the library.
I’ll give it a try.
According to https://github.com/scrapinghub/extruct/pull/137#issuecomment-629589413 it won’t work.
@kiollpt thanks for checking and @Gallaecio thanks for passing the message
Using a library would definitely be better.
I’m a bit worried about the library not having received changes since 2017, and @scottkmaxwell not being active in GitHub since 2018.
On the other hand, there are no open issues or pull requests in the repository, and worse case scenario we could fork the library.
I’ll give it a try.
I'm still around, just busy with my day job. It looks like I am not getting my GitHub notifications in email so I'll fix that now. Feel free to use or fork. If you want to just use dirtyjson, I'll try to be responsive to PRs.