extruct icon indicating copy to clipboard operation
extruct copied to clipboard

Try to fix bad JSON due to unescaped double quotes

Open Gallaecio opened this issue 4 years ago • 6 comments

Fixes #53

If we merge this, we should create a separate issue to handle https://github.com/scrapinghub/extruct/issues/53#issuecomment-389053240, which probably requires a custom fallback JSON parser.

Gallaecio avatar Jan 10 '20 13:01 Gallaecio

Codecov Report

Merging #126 (96bf6b3) into master (d78167c) will increase coverage by 0.28%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #126      +/-   ##
==========================================
+ Coverage   90.24%   90.52%   +0.28%     
==========================================
  Files          13       13              
  Lines         605      623      +18     
  Branches      136      137       +1     
==========================================
+ Hits          546      564      +18     
  Misses         52       52              
  Partials        7        7              
Impacted Files Coverage Δ
extruct/jsonld.py 100.00% <100.00%> (ø)
extruct/rdfa.py 100.00% <100.00%> (ø)
extruct/utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d78167c...96bf6b3. Read the comment docs.

codecov[bot] avatar May 07 '20 17:05 codecov[bot]

Just found https://github.com/codecobblers/dirtyjson by fast googling. May be it have sense to use it here.

whalebot-helmsman avatar May 12 '20 06:05 whalebot-helmsman

Using a library would definitely be better.

I’m a bit worried about the library not having received changes since 2017, and @scottkmaxwell not being active in GitHub since 2018.

On the other hand, there are no open issues or pull requests in the repository, and worse case scenario we could fork the library.

I’ll give it a try.

Gallaecio avatar May 12 '20 15:05 Gallaecio

According to https://github.com/scrapinghub/extruct/pull/137#issuecomment-629589413 it won’t work.

Gallaecio avatar May 19 '20 16:05 Gallaecio

@kiollpt thanks for checking and @Gallaecio thanks for passing the message

whalebot-helmsman avatar Jun 01 '20 09:06 whalebot-helmsman

Using a library would definitely be better.

I’m a bit worried about the library not having received changes since 2017, and @scottkmaxwell not being active in GitHub since 2018.

On the other hand, there are no open issues or pull requests in the repository, and worse case scenario we could fork the library.

I’ll give it a try.

I'm still around, just busy with my day job. It looks like I am not getting my GitHub notifications in email so I'll fix that now. Feel free to use or fork. If you want to just use dirtyjson, I'll try to be responsive to PRs.

scottkmaxwell avatar Jun 23 '20 17:06 scottkmaxwell