lazynlp issues

Results 10 lazynlp issues

Sort by recently updated

Check robot.txt and ai.txt

Hello. I'm new to open source contribution. I saw your issue #6 and created a robots.py file that might help you. `read_disallows(url)` : takes in a url and returns the...

GrayHat12

urllib headers

Added headers to urllib. Detailed in the issue [here](https://github.com/chiphuyen/lazynlp/issues/11)

Olamyy

Text quality score

Have you considered adding a metric to assess the text quality of the documents, for example using the frequencies of short frequent words? (http://rolandschaefer.net/?p=78)

vanyacohen

urllib fails without headers

Hi, Thanks for this great tool. I noticed urllib fails with a `Forbidden Request` error when I call `download_page` on some links. You can reproduce the error by trying the...

Olamyy

(Also) parsing structured data while you're at it

One might as well extract structured data from each element of such a dataset. Linked data. https://5stardata.info/ Useful features: - Relations to e.g. https://schema.org/Dataset (s) - Reified edges to other...

westurner

help wanted

lazynlp
lazynlp copied to clipboard

Metadata

Check robot.txt and ai.txt

Check robot.txt and ai.txt

urllib headers

Text quality score

urllib fails without headers

(Also) parsing structured data while you're at it

Capturing images of the webpages?!

"Bug Report: Pylint Warning W0102 - Dangerous Default Value in download_pages Function"

Bugs and Errors Format issue

Bug and Error Report for unused variables

← Metadata

Owner

Metadata

lazynlp lazynlp copied to clipboard

Metadata

← Metadata

Owner

Metadata

lazynlp
lazynlp copied to clipboard