flickr_scraper
flickr_scraper copied to clipboard
python flickr_scraper.py --search 'honeybees on flowers' --n 10 --download
When I tried to download the images, I got the errors below:
nargs ['honeybees on flowers'] 0/10 error... 1/10 error... 2/10 error... 3/10 error... 4/10 error... 5/10 error... 6/10 error... 7/10 error... 8/10 error... 9/10 error... 10/10 error... Done. (4.4s)
@qiyangchennrel hello!
Thank you for reaching out and providing details about the issue you're encountering. To help us diagnose and resolve the problem effectively, could you please provide a minimum reproducible example of your code? This will allow us to better understand the context and pinpoint the issue. You can find guidance on creating a reproducible example here: Minimum Reproducible Example.
Additionally, please ensure that you are using the latest versions of all relevant packages, as updates often include important bug fixes and improvements.
Looking forward to your response so we can assist you further! 😊
After following all steps and even performing it on a google colab terminal, I am also getting the error...
Hello @nzhang95120,
Thank you for providing the screenshot and additional details about the issue you're encountering. It looks like you're running into some trouble with the flickr_scraper.py script.
Here are a few steps you can take to troubleshoot and potentially resolve the issue:
-
Verify Package Versions: Ensure that you are using the latest versions of all relevant packages. Sometimes, issues are resolved in newer releases. You can update your packages using:
pip install --upgrade <package_name> -
Check Dependencies: Make sure all dependencies required by the script are installed. You can usually find these in the
requirements.txtfile or documentation of the repository. -
Error Logs: The error messages you provided are quite generic. If possible, try to capture more detailed error logs. This can often be done by running the script with increased verbosity or debug flags.
-
Internet Connection: Ensure that your internet connection is stable, as the script needs to download images from Flickr.
-
API Keys: If the script requires API keys for accessing Flickr, ensure that they are correctly set up and have the necessary permissions.
-
Example Code: Here is a minimal example to ensure everything is set up correctly:
import flickrapi import urllib.request import os # Replace with your own Flickr API key and secret api_key = 'YOUR_API_KEY' api_secret = 'YOUR_API_SECRET' flickr = flickrapi.FlickrAPI(api_key, api_secret, format='parsed-json') query = 'honeybees on flowers' num_images = 10 photos = flickr.photos.search(text=query, per_page=num_images, media='photos', sort='relevance') for i, photo in enumerate(photos['photos']['photo']): url = f"http://farm{photo['farm']}.staticflickr.com/{photo['server']}/{photo['id']}_{photo['secret']}.jpg" urllib.request.urlretrieve(url, os.path.join('downloads', f"{i}.jpg")) print(f"Downloaded {i+1}/{num_images}") print("Done.")
If you have verified all the above and the issue persists, please let us know with any additional error logs or details. This will help us assist you more effectively.
Thank you for your patience and cooperation! 😊
Same here:
Traceback (most recent call last):
File "/flickr_scraper/flickr_scraper.py", line 67, in <module>
get_urls(search=search, n=opt.n, download=opt.download)
File "/flickr_scraper/flickr_scraper.py", line 35, in get_urls
for i, photo in enumerate(photos):
File "/lib/python3.9/site-packages/flickrapi/core.py", line 690, in data_walker
photoset = rsp.getchildren()[0]
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'getchildren'
The error occurs because getchildren() is deprecated in Python 3.9+. This is a known compatibility issue in the flickrapi dependency. Let's resolve it:
- First update your packages:
pip install --upgrade flickrapi ultralytics
- If errors persist, add this workaround before your FlickrAPI initialization:
import xml.etree.ElementTree as ET
ET.Element.getchildren = lambda self: list(self) # Compatibility patch
This should resolve the XML parsing issue. Let us know if you still encounter any errors after applying these fixes.
I observe the following error with the above compatibility patch (python 3.10.14, ultralytics 8.3.71, flickerapi 2.4.0), :
import xml.etree.ElementTree as ET
ET.Element.getchildren = lambda self: list(self)
TypeError: cannot set 'getchildren' attribute of immutable type 'xml.etree.ElementTree.Element'
To the extent it still helps @qiyangchennrel, @nzhang95120
I also observed the same error and traced it to #L16 in utils/general.py. I was able to clear the error by changing
f = dir + os.path.basename(uri) # filename
to
f = os.path.join(dir, os.path.basename(uri))
A potential fix for this issue has been merged in PR #42! 🎉
Key Changes in the PR:
- Switched to
pathlibfor File Path Handling: Replaced the use of theosmodule withpathlibto improve readability, maintainability, and cross-platform compatibility. - Enhanced Filename Sanitization: Systematically removes or renames problematic file name characters to ensure cleaner, predictable file naming.
- Improved Handling of Missing File Extensions: Utilizes
pathlibfeatures for more robust and simplified suffix management. - Code Refactoring: Streamlined the logic to improve clarity and future-proof the code for easier maintenance.
These changes address potential issues with file path handling, filename conflicts, and stability, which align with resolving this issue.
If possible, please try these steps and let us know if the fix resolves the issue for you! Feedback is invaluable to ensure all edge cases are addressed.
Thanks so much for raising this issue and helping improve the project! 🙏 If the problem persists, please feel free to share additional details, and we'll be happy to assist further. 🚀
@amerk12 can you try the latest fix in #42 and see if this resolved your issue? Thank you!
@glenn-jocher yes this fix cleared the string/pathing issue that I observed. Thanks!