flickr_scraper icon indicating copy to clipboard operation
flickr_scraper copied to clipboard

python flickr_scraper.py --search 'honeybees on flowers' --n 10 --download

Open qiyangchennrel opened this issue 1 year ago • 10 comments
trafficstars

When I tried to download the images, I got the errors below:

nargs ['honeybees on flowers'] 0/10 error... 1/10 error... 2/10 error... 3/10 error... 4/10 error... 5/10 error... 6/10 error... 7/10 error... 8/10 error... 9/10 error... 10/10 error... Done. (4.4s)

qiyangchennrel avatar Jul 11 '24 07:07 qiyangchennrel

@qiyangchennrel hello!

Thank you for reaching out and providing details about the issue you're encountering. To help us diagnose and resolve the problem effectively, could you please provide a minimum reproducible example of your code? This will allow us to better understand the context and pinpoint the issue. You can find guidance on creating a reproducible example here: Minimum Reproducible Example.

Additionally, please ensure that you are using the latest versions of all relevant packages, as updates often include important bug fixes and improvements.

Looking forward to your response so we can assist you further! 😊

pderrenger avatar Jul 11 '24 11:07 pderrenger

After following all steps and even performing it on a google colab terminal, I am also getting the error... Screenshot 2024-08-11 at 4 58 31 PM

nzhang95120 avatar Aug 11 '24 23:08 nzhang95120

Hello @nzhang95120,

Thank you for providing the screenshot and additional details about the issue you're encountering. It looks like you're running into some trouble with the flickr_scraper.py script.

Here are a few steps you can take to troubleshoot and potentially resolve the issue:

  1. Verify Package Versions: Ensure that you are using the latest versions of all relevant packages. Sometimes, issues are resolved in newer releases. You can update your packages using:

    pip install --upgrade <package_name>
    
  2. Check Dependencies: Make sure all dependencies required by the script are installed. You can usually find these in the requirements.txt file or documentation of the repository.

  3. Error Logs: The error messages you provided are quite generic. If possible, try to capture more detailed error logs. This can often be done by running the script with increased verbosity or debug flags.

  4. Internet Connection: Ensure that your internet connection is stable, as the script needs to download images from Flickr.

  5. API Keys: If the script requires API keys for accessing Flickr, ensure that they are correctly set up and have the necessary permissions.

  6. Example Code: Here is a minimal example to ensure everything is set up correctly:

    import flickrapi
    import urllib.request
    import os
    
    # Replace with your own Flickr API key and secret
    api_key = 'YOUR_API_KEY'
    api_secret = 'YOUR_API_SECRET'
    
    flickr = flickrapi.FlickrAPI(api_key, api_secret, format='parsed-json')
    query = 'honeybees on flowers'
    num_images = 10
    
    photos = flickr.photos.search(text=query, per_page=num_images, media='photos', sort='relevance')
    for i, photo in enumerate(photos['photos']['photo']):
        url = f"http://farm{photo['farm']}.staticflickr.com/{photo['server']}/{photo['id']}_{photo['secret']}.jpg"
        urllib.request.urlretrieve(url, os.path.join('downloads', f"{i}.jpg"))
        print(f"Downloaded {i+1}/{num_images}")
    
    print("Done.")
    

If you have verified all the above and the issue persists, please let us know with any additional error logs or details. This will help us assist you more effectively.

Thank you for your patience and cooperation! 😊

glenn-jocher avatar Aug 13 '24 15:08 glenn-jocher

Same here:

Traceback (most recent call last):
  File "/flickr_scraper/flickr_scraper.py", line 67, in <module>
    get_urls(search=search, n=opt.n, download=opt.download)
  File "/flickr_scraper/flickr_scraper.py", line 35, in get_urls
    for i, photo in enumerate(photos):
  File "/lib/python3.9/site-packages/flickrapi/core.py", line 690, in data_walker
    photoset = rsp.getchildren()[0]
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'getchildren'

stawiski avatar Jan 30 '25 06:01 stawiski

The error occurs because getchildren() is deprecated in Python 3.9+. This is a known compatibility issue in the flickrapi dependency. Let's resolve it:

  1. First update your packages:
pip install --upgrade flickrapi ultralytics
  1. If errors persist, add this workaround before your FlickrAPI initialization:
import xml.etree.ElementTree as ET
ET.Element.getchildren = lambda self: list(self)  # Compatibility patch

This should resolve the XML parsing issue. Let us know if you still encounter any errors after applying these fixes.

pderrenger avatar Jan 31 '25 12:01 pderrenger

I observe the following error with the above compatibility patch (python 3.10.14, ultralytics 8.3.71, flickerapi 2.4.0), :

import xml.etree.ElementTree as ET ET.Element.getchildren = lambda self: list(self)

TypeError: cannot set 'getchildren' attribute of immutable type 'xml.etree.ElementTree.Element'

amerk12 avatar Feb 04 '25 21:02 amerk12

To the extent it still helps @qiyangchennrel, @nzhang95120

I also observed the same error and traced it to #L16 in utils/general.py. I was able to clear the error by changing

f = dir + os.path.basename(uri) # filename to f = os.path.join(dir, os.path.basename(uri))

amerk12 avatar Feb 04 '25 21:02 amerk12

A potential fix for this issue has been merged in PR #42! 🎉

Key Changes in the PR:

  • Switched to pathlib for File Path Handling: Replaced the use of the os module with pathlib to improve readability, maintainability, and cross-platform compatibility.
  • Enhanced Filename Sanitization: Systematically removes or renames problematic file name characters to ensure cleaner, predictable file naming.
  • Improved Handling of Missing File Extensions: Utilizes pathlib features for more robust and simplified suffix management.
  • Code Refactoring: Streamlined the logic to improve clarity and future-proof the code for easier maintenance.

These changes address potential issues with file path handling, filename conflicts, and stability, which align with resolving this issue.

If possible, please try these steps and let us know if the fix resolves the issue for you! Feedback is invaluable to ensure all edge cases are addressed.

Thanks so much for raising this issue and helping improve the project! 🙏 If the problem persists, please feel free to share additional details, and we'll be happy to assist further. 🚀

UltralyticsAssistant avatar Feb 04 '25 22:02 UltralyticsAssistant

@amerk12 can you try the latest fix in #42 and see if this resolved your issue? Thank you!

glenn-jocher avatar Feb 04 '25 22:02 glenn-jocher

@glenn-jocher yes this fix cleared the string/pathing issue that I observed. Thanks!

amerk12 avatar Feb 05 '25 15:02 amerk12