soft404
soft404 copied to clipboard
A classifier for detecting soft 404 pages
Bumps [scipy](https://github.com/scipy/scipy) from 0.18.1 to 1.10.0. Release notes Sourced from scipy's releases. SciPy 1.10.0 Release Notes SciPy 1.10.0 is the culmination of 6 months of hard work. It contains many...
Bumps [scrapy](https://github.com/scrapy/scrapy) from 1.1.2 to 2.6.2. Release notes Sourced from scrapy's releases. 2.6.2 Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0....
Bumps [lxml](https://github.com/lxml/lxml) from 3.6.4 to 4.9.1. Changelog Sourced from lxml's changelog. 4.9.1 (2022-07-01) Bugs fixed A crash was resolved when using iterwalk() (or canonicalize()) after parsing certain incorrect input. Note...
Bumps [ujson](https://github.com/ultrajson/ultrajson) from 1.35 to 5.4.0. Release notes Sourced from ujson's releases. 5.4.0 Added Add support for arbitrary size integers (#548) @JustAnotherArchivist Fixed CVE-2022-31116: Replace wchar_t string decoding implementation with...
Bumps [numpy](https://github.com/numpy/numpy) from 1.12.0 to 1.22.0. Release notes Sourced from numpy's releases. v1.22.0 NumPy 1.22.0 Release Notes NumPy 1.22.0 is a big release featuring the work of 153 contributors spread...
`import soft404` `soft404.probability('hola')` >>AttributeError: 'NoneType' object has no attribute 'tag'
Right now it's stored with joblib, and manages to work with python 2.7, but still it would be better to store just the weights. With the current pipeline it's not...
The current training dataset is too big to put in a repo or host on s3 indefinitely. It was created with a crawler that is in the repo, but still...
## Overview This PR addresses issue #[number] by transforming the soft404 package from a collection of scripts and notebooks into a professional, modern Python package with proper structure, entry points,...
Lot of notebooks. make this a real thing, setup.py pyinit.whatever etc.