Vostretsov Nikita

Results 12 issues of Vostretsov Nikita

It will be great if Tabular can align not only on fixed space count (after/before) but align on tab borders some kind of :Tab /=/ltrt Before: abc=cde abcdf=asd Now: abc...

Two fixes a here: - solution for https://github.com/pypa/pip/issues/9215 in our python3.8 environment. https://github.com/scrapy/scrapy/pull/5146/checks?check_run_id=2570524886 python3.8 runs are killed by 6 hours timeout now. - don't rely on warnings order in tests

bug
CI

After discussion in https://github.com/scrapy-plugins/scrapy-splash/pull/173 Decoding body of Response using only utf-8 is right way to fail. It is better to skip decoding at all. There is two problems with integration...

In this PR: - fix for `default_clustering_score` to reduce score for clustering with `threshold=0` - ability to pass custom distance and position functions to clustering procedure - new `_get_tree_position` and...

For a HTML obtained from http://media.pella.com/professional/adm/Clad-Wood/CRNFMDTR-dl.dxf (~7.5 Mb of text-only) tokenizing takes more than 1 hour. Preserved file in https://gist.github.com/whalebot-helmsman/987d98c092294aeeafd8735a13a37c32

I was setuping autoextract in scrapy cloud on a project with crawlera addon. Autoextract queries were routed through crawlera. Idea is to blacklist autoextract domain by default. It may have...

``` python -c 'import dateparser as D;p=D.DateDataParser(languages=["hu"]);print(p.get_date_data("október 29., péntek 9:13"))' {'date_obj': datetime.datetime(2029, 10, 1, 9, 13), 'period': 'day', 'locale': 'hu'} ``` vs ``` python -c 'import dateparser as D;p=D.DateDataParser(languages=["en"]);print(p.get_date_data("october 29.,...

Type: Bug - Language

Hi @Insutanto You doing nice work in this repo. I have the same desire: different message queues should be supported in scrapy. Old implementations of this idea and one you...

enhancement

`echo '1 111' | tomitaparser.exe config.proto` в файле prettyoutput.html наблюдается странное `1111 EOS` то есть он их почему-то слепил в одну, соответственно разные регулярки не отрабатывают

Простой текст с адресом > г. Москва ул. Порываева, дом 22, корп.1, кв. 23 Сегментатор разбивает на 4 предложения > г . **EOS** > Москва ул . **EOS** > Порываева...