icrawler
icrawler copied to clipboard
Scrape metadata with the built-in Flickr crawler
class MyImageDownloader(ImageDownloader):
def __init__(self, thread_num, signal, session, storage, log_file):
super(MyDownloader, self).__init__(thread_num, signal, session,
storage)
self.log_file = open(log_file, 'w')
def process_meta(self, task):
if task['success']:
with self.lock:
self.log_file.write('{} {} {} {}\n'.format(
task['filename'], task['file_url'], *task['img_size']))
When someone asked a question similar to mine earlier, this was the example code that solved it. This code rewrites the process_meta function so that it scrapes file name, file url, and img size. I would also like to know if there is a way to scrape photo title, description, and tags with the built-in Flickr crawler. Perhaps it is just a matter of using different keywords in the task dict?
Thanks!