beets
beets copied to clipboard
Moved files for merged imports lead to crashes in chroma plugin
OK, so this time I really think I spotted a problem. I ripped a CD, but since it didn’t work properly, it ended up in 2 directories. I then imported those directories with beets. The first one imported fine, with of course the mention that some tracks were missing. Then it proceeded to the second directory, and again found the album and mentioned tracks were failing, but I chose Apply, and then… BOOM.
Problem
Running this command in verbose (-vv) mode:
$ beet -vv import -P /media/hendrik2/Scharnier/Muziek\ te\ sorteren/ 2> beet2.log
/media/hendrik2/Scharnier/Muziek te sorteren/Cobla Amsterdam (11 items)
Tagging:
Cobla Amsterdam - Live in Catalunya
URL:
https://musicbrainz.org/release/d09e68bd-04b7-4e5b-bbfa-854868467486
(Similarity: 90.4%) (missing tracks) (CD, 2008, NL, Etcetera, KTC 1375)
Missing tracks (4/15 - 26.7%):
! Ricard Viladesau – Barcelona ’92 (obligada 2 trumpets) (# 7) (5:06)
! Francesc Riumalló – Oncle i nebot (obligada tenora) (# 9) (4:48)
! Max Havart – Rouranenca (obligada tenora) (#11) (4:06)
! Emili Saló – El Bruel de Pals (obligada 2 fiscorns) (#12) (4:44)
[A]pply, More candidates, Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort?
/media/hendrik2/Scharnier/Muziek te sorteren/Cobla La Principal d’Amsterdam (4 items)
Finding tags for album "Cobla La Principal d’Amsterdam - Live in Catalunya".
Candidates:
1. Cobla Amsterdam - Live in Catalunya (36.3%) (missing tracks, tracks, artist, ...) (CD, 2008, NL, Etcetera, KTC 1375)
2. Cobla Maravella - Catalunya Dança (19.6%) (missing tracks, tracks, album, ...) (12" Vinyl, 1964, ES, Zafiro, Z-M 10)
3. Cobla Maravella Dtor. Lluís Ferrer i Puigdemont - Catalunya Canta (19.4%) (missing tracks, tracks, album, ...) (12" Vinyl, 1968, ES, Canigó, CANI-4-S)
4. Cobla "La Selvatana" - Sardanas A Salvador Dalí (17.0%) (tracks, album, artist, ...) (7" Vinyl, 1964, ES, Sandiego, SAN-111)
5. Jaume Pahissa, Miguel Llobet Solés, Gaspar Cassadó, Manuel de Falla; Giacomo Palazzesi - Catalunya (16.7%) (missing tracks, tracks, artist, ...) (Digital Media, 2016, XW, GuitArt Collection)
# selection (default 1), Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort?
Correcting tags from:
Cobla La Principal d’Amsterdam - Live in Catalunya
To:
Cobla Amsterdam - Live in Catalunya
URL:
https://musicbrainz.org/release/d09e68bd-04b7-4e5b-bbfa-854868467486
(Similarity: 36.3%) (missing tracks, tracks, artist, year) (CD, 2008, NL, Etcetera, KTC 1375)
* Track 7 -> Ricard Viladesau – Barcelona ’92 (obligada 2 trumpets) (title)
* Track 9 -> Francesc Riumalló – Oncle i nebot (obligada tenora) (title)
* Track 11 -> Max Havart – Rouranenca (obligada tenora) (title)
* Track 12 -> Emili Saló – El Bruel de Pals (obligada 2 fiscorns) (title)
Missing tracks (11/15 - 73.3%):
! Vicens Bou – Girona aimada! (# 1) (4:19)
! Ricard Castells – Els Degotalls (# 2) (3:38)
! Conrad Saló – La Feste de Sant Martirià (# 3) (4:51)
! Josep Saderra – Mercé (# 4) (4:26)
! Manel S. Puigferrer – Ceretana (# 5) (4:27)
! Pere Mercader – Conxita encisera (# 6) (4:41)
! Jaume Roca Delpech – Gavina (# 8) (4:22)
! Joaquim Serra – Sabadell (#10) (4:33)
! Robert Sarrade – En Guii de Sant Cebrià (#13) (4:55)
! Marcel Artiaga – Els milors anys (#14) (4:07)
! Tomàs Gil Membrado – 25 anys de Baix Montseny (#15) (4:56)
Apply, More candidates, Skip, Use as-is, as Tracks, Group albums,
Enter search, enter Id, aBort? a
Old: 11 items, FLAC, 725kbps, 44.1kHz/16 bit, 49:21, 256.4 MiB
New: 4 items, FLAC, 715kbps, 44.1kHz/16 bit, 18:46, 95.9 MiB
[S]kip new, Keep all, Remove old, Merge all? m
Led to this problem:
(paste here)
Setup
OS: Linux Mint Mate edition 20.3 64 bit Python version: $ python2 --version Python 2.7.18 beets version: $ beet --version beets version 1.6.0 Python version 3.8.10 plugins: chroma, fetchart, fromfilename, lastgenre (different python versions, that’s interesting)* Turning off plugins made problem go away (yes/no): How can I test this? The first folder is imported and moved, the second is still in the to be imported folder.
My configuration (output of beet config) is:
$ beet config
directory: /media/hendrik2/Scharnier/Muziek
library: ~/.beets/musiclibrary.db
import:
move: yes
log: beetslog.txt
art_filename: "$albumartist \u2013 $album"
plugins: fetchart lastgenre chroma fromfilename mbsync
paths:
default: "Albums/$albumartist/$album%aunique{}/$artist \u2013 $title"
singleton: "Losse nummers/$artist/$artist \u2013 $title"
comp: "Compilaties/$album%aunique{}/$artist \u2013 $title"
acoustid:
apikey: REDACTED
chroma:
auto: yes
fetchart:
auto: yes
minwidth: 0
maxwidth: 0
quality: 0
max_filesize: 0
enforce_ratio: no
cautious: no
cover_names:
- cover
- front
- art
- album
- folder
sources:
- filesystem
- coverart
- itunes
- amazon
- albumart
google_key: REDACTED
google_engine: 001442825323518660753:hrh5ch1gjzm
fanarttv_key: REDACTED
lastfm_key: REDACTED
store_source: no
high_resolution: no
deinterlace: no
cover_format:
lastgenre:
whitelist: yes
min_weight: 10
count: 1
fallback:
canonical: no
source: album
force: yes
auto: yes
separator: ', '
prefer_specific: no
title_case: yes
Looks the same or at least very similar to #4283. This might suggest that there's some issue with the logic behind our merge logic. Maybe there's an issue with the already_merged check? That happens in the user_query stage, which would explain why this blows up in the chroma plugin. Anyway, I'm not sufficiently familiar with this part of the code to investigate this quickly.
Ah, this does indeed look like a problem… for completeness, here's an inline paste of the traceback from the attached log:
Traceback (most recent call last):
File "/home/hendrik2/.local/bin/beet", line 8, in <module>
sys.exit(main())
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/ui/__init__.py", line 1285, in main
_raw_main(args)
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/ui/__init__.py", line 1272, in _raw_main
subcommand.func(lib, suboptions, subargs)
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/ui/commands.py", line 973, in import_func
import_files(lib, paths, query)
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/ui/commands.py", line 943, in import_files
session.run()
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/importer.py", line 340, in run
pl.run_parallel(QUEUE_SIZE)
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/util/pipeline.py", line 446, in run_parallel
raise exc_info[1].with_traceback(exc_info[2])
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/util/pipeline.py", line 314, in run
for msg in _allmsgs(out):
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/util/pipeline.py", line 467, in pull
out = coro.send(msg)
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/util/pipeline.py", line 193, in coro
func(*(args + (task,)))
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/importer.py", line 1369, in lookup_candidates
plugins.send('import_task_start', session=session, task=task)
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/plugins.py", line 488, in send
result = handler(**arguments)
File "/home/hendrik2/.local/lib/python3.8/site-packages/beets/plugins.py", line 145, in wrapper
return func(*args, **kwargs)
File "/home/hendrik2/.local/lib/python3.8/site-packages/beetsplug/chroma.py", line 180, in fingerprint_task
return fingerprint_task(self._log, task, session)
File "/home/hendrik2/.local/lib/python3.8/site-packages/beetsplug/chroma.py", line 249, in fingerprint_task
acoustid_match(log, item.path)
File "/home/hendrik2/.local/lib/python3.8/site-packages/beetsplug/chroma.py", line 89, in acoustid_match
duration, fp = acoustid.fingerprint_file(util.syspath(path))
File "/home/hendrik2/.local/lib/python3.8/site-packages/acoustid.py", line 340, in fingerprint_file
return _fingerprint_file_audioread(path, maxlength)
File "/home/hendrik2/.local/lib/python3.8/site-packages/acoustid.py", line 280, in _fingerprint_file_audioread
with audioread.audio_open(path) as f:
File "/home/hendrik2/.local/lib/python3.8/site-packages/audioread/__init__.py", line 111, in audio_open
return BackendClass(path)
File "/home/hendrik2/.local/lib/python3.8/site-packages/audioread/rawread.py", line 62, in __init__
self._fh = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: b'/media/hendrik2/Scharnier/Muziek te sorteren/Cobla Amsterdam/Cobla Amsterdam - Vicens Bou \xe2\x80\x93 Girona aimada!.flac'
So the problem here actually manifests within the chroma plugin's hook. It can't find a file from the first directory (Cobla Amsterdam) in the source position (i.e., before the files are moved). Just guessing here, but I suspect the problem arises from a combination of all three of these factors:
import.moveis enabled- the
chromaplugin is enabled - the user chooses "merge" at the duplicate prompt
Sketching it out, what happens is that:
- We fingerprint the first "album." Then, the first "album" is imported and begins being moved to the destination.
- Concurrently, the second "album" is fingerprinted and imported, and we detect the duplicate.
- Beets recreates a combined album with the merged tracks from the two "albums" and inserts it back into the import pipeline.
- At this point,
chromaattempts to fingerprint the entire combined album again. However, the combined album still hasn't had its paths updated to reflect the fact that the album has moved.
If we were copying files (instead of moving), this wouldn't come up because we would just re-fingerprint the source files again.
It's a little tricky to think of the right solution for this problem! Some ideas:
- The answer may be that we need to make sure that the "merged" album that we create from the two parts has the new, moved paths. But that would require "stalling" until the move is complete—that is, if the files are still being moved, we would have the opposite problem where we might try to fingerprint the "destination" paths before the files arrive there.
- We could avoid needlessly re-fingerprinting the same old files. This would have the added benefit of being faster/more efficient. But it is a little bit of a "whack-a-mole" solution, because it involves making individual plugins more aware of this edge condition rather than fixing the import pipeline itself once and for all.
Thanks for delving deeper into this issue! Your analysis sounds a lot like the hunch that I had.
It's a little tricky to think of the right solution for this problem! Some ideas:
- The answer may be that we need to make sure that the "merged" album that we create from the two parts has the new, moved paths. But that would require "stalling" until the move is complete—that is, if the files are still being moved, we would have the opposite problem where we might try to fingerprint the "destination" paths before the files arrive there.
I'd argue that this is the right approach, and maybe it should even be stricter: It might be safest to block the merged import task until the original tasks have completely left the pipeline (which in fact is very similar to waiting for file moves, since the manipulate_files stage runs late during the import). (Do take my thoughts here with a grain of salt: I still don't fully understand the merge process.)
Yeah, I think that may indeed be the simplest policy. The opposite extreme, which you can imagine wanting instead, would be to prevent anything from happening to either of the constituent partial albums—to make it as if the two were originally imported together. However, this policy is impossible to enforce: for one, it's not always the case that the two partial albums are both "in flight"; one might be already imported long ago and just sitting in the database. Or it might be part of the current import but long enough ago that it's impossible to "back in time" to prevent importing.
So waiting for both imports to be completely done before attempting to re-import the pair together seems like the only reasonable clean policy. Doing this is itself not exactly easy—we don't currently have a way to see the status of a previous in-flight import task. So it will take some creativity to sort out the synchronization.