spotify-downloader icon indicating copy to clipboard operation
spotify-downloader copied to clipboard

Error saving metadata for large playlist: ([Errno 24] Too many open files)

Open almostimplemented opened this issue 1 year ago • 3 comments

System OS

Linux

Python Version

3.9 (CPython)

Install Source

pip / PyPi

Install version / commit hash

v4.0.0rc2

Expected Behavior vs Actual Behavior

Expected the command to generate a save file with YouTube URLs for all tracks of the playlist -- minus those with no match found.

What actually happens is that the process fails due to too many open files.

Perhaps the HTTPSConnection objects accumulate during the track iteration?

Steps to reproduce - Ensure to include actual links!

spotdl --save-file studio_piano_jazz.spotdl --preload save 'https://open.spotify.com/playlist/6147ogdoSLuOiYlPLHEZ9k?si=22b5ba535d164791'

Traceback

Found url for Gerald Clayton - My Ideal 2: https://youtube.com/watch?v=kb47y2BcKUI
Found url for Gerald Clayton - My Ideal 1: https://youtube.com/watch?v=tu4PE1czJMM
Found url for Gerald Clayton - There Is Music Where You're Going My Friends: https://youtube.com/watch?v=_NDUgdSuigA
Found url for Gerald Clayton - Nobody Else But Me: https://youtube.com/watch?v=kXZwvt2IjQM
Found url for Gerald Clayton - Hank: https://youtube.com/watch?v=VtvBrC_cHxg
...
Found url for Don Grusin - So Nan Desu: https://youtube.com/watch?v=18AF8aaTqZc
Found url for Don Grusin - Jam Suite 2 - Colorado Peach Jam: https://youtube.com/watch?v=V0i0QHMUBXA
Song(name='Flora', artists=['Don Grusin'], artist='Don Grusin', album_name='Old Friends & Relatives', album_artist='Don Grusin', genres=[], disc_number=1, disc_count=1, duration=283.133, year=1996, date='1996-01-01', track_number=4, tracks_count=14, 
song_id='604SkI7RzRtpClr4j8fvML', cover_url='https://i.scdn.co/image/ab67616d0000b2732e761d41d601ee6181e5455e', explicit=False, publisher='Don Grusin', url='https://open.spotify.com/track/604SkI7RzRtpClr4j8fvML', isrc='JPB520600223', copyright_text='1996 Don Grusin 
Music', download_url=None, song_list=None, lyrics=None) generated an exception: HTTPSConnectionPool(host='music.youtube.com', port=443): Max retries exceeded with url: / (Caused by SSLError(OSError(24, 'Too many open files')))
Song(name='Foreign Service', artists=['Don Grusin'], artist='Don Grusin', album_name='Old Friends & Relatives', album_artist='Don Grusin', genres=[], disc_number=1, disc_count=1, duration=307.906, year=1996, date='1996-01-01', track_number=5, tracks_count=14, 
song_id='1Tf7Hgq28QYS9vpdU8Krdi', cover_url='https://i.scdn.co/image/ab67616d0000b2732e761d41d601ee6181e5455e', explicit=False, publisher='Don Grusin', url='https://open.spotify.com/track/1Tf7Hgq28QYS9vpdU8Krdi', isrc='JPB520600224', copyright_text='1996 Don Grusin 
Music', download_url=None, song_list=None, lyrics=None) generated an exception: HTTPSConnectionPool(host='music.youtube.com', port=443): Max retries exceeded with url: / (Caused by SSLError(OSError(24, 'Too many open files')))
Song(name='Estate', artists=['Don Grusin'], artist='Don Grusin', album_name='Old Friends & Relatives', album_artist='Don Grusin', genres=[], disc_number=1, disc_count=1, duration=315.466, year=1996, date='1996-01-01', track_number=6, tracks_count=14, 
song_id='1KJAVAs299rrW5JGokUuum', cover_url='https://i.scdn.co/image/ab67616d0000b2732e761d41d601ee6181e5455e', explicit=False, publisher='Don Grusin', url='https://open.spotify.com/track/1KJAVAs299rrW5JGokUuum', isrc='JPB520600225', copyright_text='1996 Don Grusin 
Music', download_url=None, song_list=None, lyrics=None) generated an exception: HTTPSConnectionPool(host='music.youtube.com', port=443): Max retries exceeded with url: / (Caused by SSLError(OSError(24, 'Too many open files')))
Song(name='Circles', artists=['Don Grusin'], artist='Don Grusin', album_name='Old Friends & Relatives', album_artist='Don Grusin', genres=[], disc_number=1, disc_count=1, duration=292.826, year=1996, date='1996-01-01', track_number=7, tracks_count=14, 
song_id='3Y9BIFK8FXugkdbVH6HyWV', cover_url='https://i.scdn.co/image/ab67616d0000b2732e761d41d601ee6181e5455e', explicit=False, publisher='Don Grusin', url='https://open.spotify.com/track/3Y9BIFK8FXugkdbVH6HyWV', isrc='JPB520600226', copyright_text='1996 Don Grusin 
Music', download_url=None, song_list=None, lyrics=None) generated an exception: HTTPSConnectionPool(host='music.youtube.com', port=443): Max retries exceeded with url: / (Caused by SSLError(OSError(24, 'Too many open files')))
...
(errors spew out, all the same 'Too many open files')
...
Song(name='My Foolish Heart', artists=['McCoy Tyner'], artist='McCoy Tyner', album_name='Jazz Roots', album_artist='McCoy Tyner', genres=['avant-garde jazz', 'contemporary post-bop', 'free jazz', 'hard bop', 'jazz', 'jazz fusion', 'jazz piano', 'stride'], disc_number=1, 
disc_count=1, duration=257.506, year=2000, date='2000-01-01', track_number=3, tracks_count=14, song_id='4z9kyF5zFf7VmdLgfBcAC4', cover_url='https://i.scdn.co/image/ab67616d0000b2732658a8306b9c1e92a5cef638', explicit=False, publisher='Telarc', 
url='https://open.spotify.com/track/4z9kyF5zFf7VmdLgfBcAC4', isrc='USTE10450703', copyright_text='© 2000 Telarc International', download_url=None, song_list=None, lyrics=None) generated an exception: HTTPSConnectionPool(host='music.youtube.com', port=443): Max retries 
exceeded with url: / (Caused by SSLError(OSError(24, 'Too many open files')))
Saved 971 songs to studio_piano_jazz.spotdl

Other details

There are 2,209 songs on this playlist.

Reproduces on Mac OS X as well.

almostimplemented avatar Aug 22 '22 13:08 almostimplemented

Have you changed number of threads? How's your connection? Is is stable?

I think it may be related to how our requests are sent I will see if I can improve this soon™

xnetcat avatar Aug 24 '22 11:08 xnetcat

Hi @xnetcat , thanks for the follow up.

  1. No changes have been made to the number of threads.

  2. Connection is stable -- this error reproduces on my local machine, a compute server at my research facility, and on my personal website hostbox on DigitalOcean.

A simple system-level workaround is to increase the limit, via ulimit -n 10240.

almostimplemented avatar Aug 24 '22 13:08 almostimplemented

I think the issue is this line of code in save.py:

    if preload:
        save_data = []
        with concurrent.futures.ThreadPoolExecutor(
            max_workers=downloader.threads
        ) as executor:
            future_to_song = {
                executor.submit(downloader.search, song): song for song in songs
            }

I put some print statements around the dictionary comprehension and I can see that is when the open files accumulate. The YTM URL sessions just open one by one.

If I move the logic around, the program runs without accumulating the mass of open files.

This may have performance implications -- I admit I am not very experienced with async/concurrent Python programming.

         with concurrent.futures.ThreadPoolExecutor(
             max_workers=downloader.threads
         ) as executor:
-            future_to_song = {
-                executor.submit(downloader.search, song): song for song in songs
-            }
-            for future in concurrent.futures.as_completed(future_to_song):
-                song = future_to_song[future]
+            def process_song(song):
+                '''
+                Closure with the downloader to search for song and
+                return the updated JSON object with the download_url,
+                releasing the HTTPS session as soon as possible.
+                '''
                 try:
-                    data, _ = future.result()
+                    data, _ = downloader.search(song)
                     if data is None:
                         downloader.progress_handler.error(
                             f"Could not find a match for {song.display_name}"
                         )
-                        continue
+                        return None

                     downloader.progress_handler.log(
                         f"Found url for {song.display_name}: {data}"
                     )
-                    save_data.append({**song.json, "download_url": data})
+
+                    return {**song.json, "download_url": data}
                 except Exception as exc:
                     downloader.progress_handler.error(
                         f"{song} generated an exception: {exc}"
                     )

+            future_to_json = {
+                executor.submit(process_song, song): song for song in songs
+            }
+
+            for future in concurrent.futures.as_completed(future_to_song):
+                song_json = future.result()
+                if song_json:
+                    save_data.append(song_json)

almostimplemented avatar Aug 26 '22 11:08 almostimplemented

This issue has been automatically marked stale because there hasn't been any activity for the last 30 days.

stale[bot] avatar Sep 25 '22 11:09 stale[bot]

not stale - for @xnetcat to review I believe?

Or possibly @almostimplemented you can open a PR if your solution works :)

Silverarmor avatar Sep 28 '22 12:09 Silverarmor

https://github.com/spotDL/spotify-downloader/commit/182ed3f22c87c2bf99e077021b7a0b9c62ce0c47

xnetcat avatar Oct 01 '22 10:10 xnetcat