icrawler icon indicating copy to clipboard operation
icrawler copied to clipboard

A multi-thread crawler framework with many builtin image crawlers provided.

Results 42 icrawler issues
Sort by recently updated
recently updated
newest added

some website return abnormal image, which may cause the death of download thread

When I command to crawl 1000 images, I got message around 500th image. That means there's no more images? But when I search on flickr site, there are more hundreds...

bug

First, I want to say thank you. The icrawler does have me a lot. I have read the source code, but didn't find a way to specify the interval for...

feature

when I use the keyword: '热水器+浴室' to search on the website of Baidu, I got the correct result: ![微信截图_20190311132516](https://user-images.githubusercontent.com/13494034/54103799-b8101680-4408-11e9-98ee-bc9c536343ab.png) however, when I use the same keyword in icrawler, I got...

bug
needs reproduce

I am trying to access task_queue to access the task dictionary. But i am unable to, so can someone please suggest how to go about it. Thanks

question

I was try to change the `root_dir` by the following: ` google_crawler = GoogleImageCrawler( feeder_threads=1, parser_threads=1, downloader_threads=4, storage=storage) ` ` google_crawler.set_storage(new_storage) ` But it doesn't seem to work. Did I...

bug
needs reproduce

how to download images of multiple classs?

question

How to use multiple Color, Size and License ?

question

As Google Crawler doesn't work properly at this momont, I tried to use Bing Crawler. But all downloaded image files were collapsed. It worked well in the last month. Will...

bug
needs reproduce

class MyImageDownloader(ImageDownloader): def __init__(self, thread_num, signal, session, storage, log_file): super(MyDownloader, self).__init__(thread_num, signal, session, storage) self.log_file = open(log_file, 'w') def process_meta(self, task): if task['success']: with self.lock: self.log_file.write('{} {} {} {}\n'.format( task['filename'],...

question