icrawler
icrawler copied to clipboard
A multi-thread crawler framework with many builtin image crawlers provided.
some website return abnormal image, which may cause the death of download thread
When I command to crawl 1000 images, I got message around 500th image. That means there's no more images? But when I search on flickr site, there are more hundreds...
First, I want to say thank you. The icrawler does have me a lot. I have read the source code, but didn't find a way to specify the interval for...
when I use the keyword: '热水器+浴室' to search on the website of Baidu, I got the correct result:  however, when I use the same keyword in icrawler, I got...
I am trying to access task_queue to access the task dictionary. But i am unable to, so can someone please suggest how to go about it. Thanks
I was try to change the `root_dir` by the following: ` google_crawler = GoogleImageCrawler( feeder_threads=1, parser_threads=1, downloader_threads=4, storage=storage) ` ` google_crawler.set_storage(new_storage) ` But it doesn't seem to work. Did I...
how to download images of multiple classs?
As Google Crawler doesn't work properly at this momont, I tried to use Bing Crawler. But all downloaded image files were collapsed. It worked well in the last month. Will...
class MyImageDownloader(ImageDownloader): def __init__(self, thread_num, signal, session, storage, log_file): super(MyDownloader, self).__init__(thread_num, signal, session, storage) self.log_file = open(log_file, 'w') def process_meta(self, task): if task['success']: with self.lock: self.log_file.write('{} {} {} {}\n'.format( task['filename'],...