dhgelling

Results 2 issues of dhgelling

In my usage, a bit speed bottleneck is the sequential downloading of images from an article when finding the top image. While the current implementation attempts to only download partial...

The text content of newspapers seems to be returned as paragraphs separated by two newlines. When doing nlp on this, the tokenizer sometimes thinks a sentence spans across two paragraphs,...