bozden
bozden
OK, the first issue was not with CommandLineDiskImager... I was using COMODO Internet Security Premium and AutoSandbox was on. Although I clicked "ignore" it was not working, may be a...
OK, I'll try by removing all those db related stuff...
The following shows the problem. I replaced DB with a simple list. I used a real domain... ``` from datetime import datetime import scrapy from scrapy.linkextractors import LinkExtractor from scrapy.spiders...
Just add a print statement to process_value function... As I indicated above, it is not failing but passes unfiltered links to that function. ``` def process_value(url): print(url) return url ```...
I suspected both backward-compatibility issues and being it by-design. But... I must say, the current design caught me by surprise. If I'm specifying allow & deny parameters, I surely want...
BTW, I solved it as follows by addition of a simple library for now: ``` def process_link(link): if not is_url_ok(link, allowList, denyList) or db.is_already_scraped(link): return None return link ``` Now...
I was afraid of that :(
We replicated the issue. I confirm the issue is existing on Colab, and so does @wasertech (ran on local machine).
I checked with pip (v1.3.0), git clone v1.3.0 and git clone main all failed. Could not pass another bug with v1.2.0 ( #2110 ) But it worked OK with git...
Hmmm. Common Voice v9.0 is out and I want to start training some languages. It seems I will be working from a patched fork :(