arche icon indicating copy to clipboard operation
arche copied to clipboard

Analyze scraped data

Results 27 arche issues
Sort by recently updated
recently updated
newest added

A reloadable config or passing arguments is needed, so anybody can set `report_all()` at once: E.g. we have `threshold` for coverage diff, which defaults to 0.2, so the config might...

Type: Feature
good first issue

Similar to https://github.com/fastai/fastai/blob/master/setup.py The goal is to have an easy-to-set environment, since environment are not dependencies by nature. E.g. The library should run in Jupyter, but Jupyter is not a...

good first issue
Type: Docs

There is some difference between schemas in files and `dict`. In particular, all `\` in files should be double escaped, meaning we have this `"^https?://www\\.realtor\\.ca/propertyDetails\\.aspx\\?PropertyId=[0-9]+$"` While python `jsons` can eat...

Type: Feature
good first issue

To allow report customization, results can have a better API. The [current one](https://github.com/scrapinghub/arche/blob/master/src/arche/rules/result.py) looks like: ``` arche.report.results.get("JSON Schema Validation") Result( name='JSON Schema Validation', messages={ :[ Message(summary='34021 items were checked, 3...

Type: Feature
Type: Question

https://github.com/modin-project/modin They claim a lot, let's see what we get with the actual data. I feel like the only thing which really makes the difference (100x times) is numpy and...

Type: Question
Type: Performance

Even for me it takes some seconds to figure what it just doesn't work. I see it's either a minus in a design - e.g. it should feel like you...

good first issue

Currently if a filtered job returns 0 items, the first test simply fails. While there're some hints which point on the number of returned errors - `0it`, it's not visible...

Type: Feature
good first issue