soft404 icon indicating copy to clipboard operation
soft404 copied to clipboard

A classifier for detecting soft 404 pages

Results 10 soft404 issues
Sort by recently updated
recently updated
newest added

Bumps [scipy](https://github.com/scipy/scipy) from 0.18.1 to 1.10.0. Release notes Sourced from scipy's releases. SciPy 1.10.0 Release Notes SciPy 1.10.0 is the culmination of 6 months of hard work. It contains many...

dependencies

Bumps [scrapy](https://github.com/scrapy/scrapy) from 1.1.2 to 2.6.2. Release notes Sourced from scrapy's releases. 2.6.2 Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0....

dependencies

Bumps [lxml](https://github.com/lxml/lxml) from 3.6.4 to 4.9.1. Changelog Sourced from lxml's changelog. 4.9.1 (2022-07-01) Bugs fixed A crash was resolved when using iterwalk() (or canonicalize()) after parsing certain incorrect input. Note...

dependencies

Bumps [ujson](https://github.com/ultrajson/ultrajson) from 1.35 to 5.4.0. Release notes Sourced from ujson's releases. 5.4.0 Added Add support for arbitrary size integers (#548) @​JustAnotherArchivist Fixed CVE-2022-31116: Replace wchar_t string decoding implementation with...

dependencies

Bumps [numpy](https://github.com/numpy/numpy) from 1.12.0 to 1.22.0. Release notes Sourced from numpy's releases. v1.22.0 NumPy 1.22.0 Release Notes NumPy 1.22.0 is a big release featuring the work of 153 contributors spread...

dependencies

`import soft404` `soft404.probability('hola')` >>AttributeError: 'NoneType' object has no attribute 'tag'

Right now it's stored with joblib, and manages to work with python 2.7, but still it would be better to store just the weights. With the current pipeline it's not...

The current training dataset is too big to put in a repo or host on s3 indefinitely. It was created with a crawler that is in the repo, but still...

## Overview This PR addresses issue #[number] by transforming the soft404 package from a collection of scripts and notebooks into a professional, modern Python package with proper structure, entry points,...

Lot of notebooks. make this a real thing, setup.py pyinit.whatever etc.