snacktory issues

Bump junit from 4.11 to 4.13.1

Bumps [junit](https://github.com/junit-team/junit4) from 4.11 to 4.13.1. Release notes Sourced from junit's releases. JUnit 4.13.1 Please refer to the release notes for details. JUnit 4.13 Please refer to the release notes...

dependabot[bot]

dependencies

Extract text, title, etc from url without fetch url (avoid its downloading)

I publish a new method in HtmlFetcher called extract which has a new parameter (content) to pass byte[] content from url. I would like to avoid downloading the url´s content....

adelaidaram

Bad parsing of article from `nytimes`

Here is the example: https://www.nytimes.com/2017/10/09/business/general-motors-driverless.html Text not fully parsed from the beginning. It starts only from: ``` The efforts have been moving forward in earnest since early last year, when...

Hronom

Bad parsing of article from `cnbc`

Here is the example: https://www.cnbc.com/2017/10/09/amazons-comedies-win-with-critics-while-hulu-is-a-hit-with-audiences.html https://www.cnbc.com/2017/10/10/opec-calls-on-us-shale-oil-producers-to-accept-shared-responsibility.html Text not fully parsed. Only first part of article.

Hronom

Not able to extract content

1

Not able to extract content from the some websites like quora.com and possibly some others. It is returning 403, for HEAD request method at [this line](https://github.com/karussell/snacktory/blob/master/src/main/java/de/jetwick/snacktory/HtmlFetcher.java#L360) in HtmlFetcher class.

saketmalpure

Crux, an Android-optimized fork of Snacktory, with many issues fixed

7

Hi @karussell, thanks for building and sharing Snacktory! You said you were [looking for someone](869dc14c28c0c33dac07acfd244530c54ccb7473) to take over maintenance and future development? We’ve been working hard on our own fork,...

chimbori

Converter.detectCharset throws for inputs longer than 2048

protected String detectCharset(String key, ByteArrayOutputStream bos, BufferedInputStream in, String enc) throws IOException { byte[] arr = new byte[2048]; how to reproduce: do a fetchAndExtract of this url 'http://www.gazzetta.it/Sport-Invernali/Sci-Alpino/Coppa-Mondo-Sci/26-02-2017/sci-combinata-brignone-ho-sciato-senza-paura-uscire-180995893986.shtml'

LBoraz

Stack overflow ...

Very occasionally I'm getting a stack overflow in 1.3-SNAPSHOT- so clearly it is content specific. Sadly I haven't been able to capture an offending site yet: java.lang.StackOverflowError at java.util.LinkedHashMap.afterNodeInsertion(LinkedHashMap.java:299) at...

alanlit

Make it possible to Increase maxBytes in HtmlFetcher

Hello, I am getting an exception when loading urls with pages larger than the fixed `500000 maxBytes` limit specified in `Converter` class. Please add a way to either modify this...

falmanna

dependency via sbt

Did you manage to add the dependency with sbt? I do get different exceptions while referring to different versions

sebastian-alfers

snacktory
snacktory copied to clipboard

Metadata

Bump junit from 4.11 to 4.13.1

Extract text, title, etc from url without fetch url (avoid its downloading)

Bad parsing of article from `nytimes`

Bad parsing of article from `cnbc`

Not able to extract content

Crux, an Android-optimized fork of Snacktory, with many issues fixed

Converter.detectCharset throws for inputs longer than 2048

Stack overflow ...

Make it possible to Increase maxBytes in HtmlFetcher

dependency via sbt

← Metadata

Owner

Metadata

snacktory snacktory copied to clipboard

Metadata

← Metadata

Owner

Metadata

snacktory
snacktory copied to clipboard