beets
beets copied to clipboard
Some duplicate_action settings are not thread safe
When importing a ~1000 track collection in to beets I experienced the following error.
My guess is this is due to using duplicate_action: merge
whilst having threaded: yes
as when I set threaded: no
, the problem went away (albeit it took much longer to import).
For example, consider a compilation that is imported from the following directory structure:
/root/
- foo/foo.mp3
- bar/bar.mp3
- zar/zar.mps
And then during the import, beets attempts to merge it with multiple threads in to
/root/
- compilation/bar.mp3
- compilation/foo.mp3
- compilation/zar.mps
Then my guess is that perhaps during the merge, threads can conflict on the ID of a track.
If this is too complicated to fix then I think the documentation should atleast be updated stating certain actions are not thread safe: https://beets.readthedocs.io/en/stable/reference/config.html#duplicate-action
Problem
The content of stderr is 'Traceback (most recent call last):
File "/lsiopy/lib/python3.11/site-packages/beets/util/functemplate.py", line 574, in substitute
res = self.compiled(values, functions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/util/functemplate.py", line 598, in wrapper_func
args[VARIABLE_PREFIX + varname] = values[varname]
~~~~~~^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 428, in __getitem__
value = self._get(key)
^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 415, in _get
if self.for_path and key in self.album_keys:
^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/util/__init__.py", line 1081, in wrapper
value = func(self)
^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 396, in album_keys
if self.album:
^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 409, in album
return self.item._cached_album
^^^^^^^^^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 581, in _cached_album
self.__album.load()
File "/lsiopy/lib/python3.11/site-packages/beets/dbcore/db.py", line 562, in load
assert stored_obj is not None, f"object {self.id} not in DB"
^^^^^^^^^^^^^^^^^^^^^^
AssertionError: object 8 not in DB
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lsiopy/bin/beet", line 8, in <module>
sys.exit(main())
^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/ui/__init__.py", line 1285, in main
_raw_main(args)
File "/lsiopy/lib/python3.11/site-packages/beets/ui/__init__.py", line 1272, in _raw_main
subcommand.func(lib, suboptions, subargs)
File "/lsiopy/lib/python3.11/site-packages/beets/ui/commands.py", line 973, in import_func
import_files(lib, paths, query)
File "/lsiopy/lib/python3.11/site-packages/beets/ui/commands.py", line 943, in import_files
session.run()
File "/lsiopy/lib/python3.11/site-packages/beets/importer.py", line 340, in run
pl.run_parallel(QUEUE_SIZE)
File "/lsiopy/lib/python3.11/site-packages/beets/util/pipeline.py", line 446, in run_parallel
raise exc_info[1].with_traceback(exc_info[2])
File "/lsiopy/lib/python3.11/site-packages/beets/util/pipeline.py", line 358, in run
self.coro.send(msg)
File "/lsiopy/lib/python3.11/site-packages/beets/util/pipeline.py", line 170, in coro
task = func(*(args + (task,)))
^^^^^^^^^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/importer.py", line 1566, in manipulate_files
task.manipulate_files(
File "/lsiopy/lib/python3.11/site-packages/beets/importer.py", line 757, in manipulate_files
item.move(operation)
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 915, in move
dest = self.destination(basedir=basedir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 979, in destination
subpath = self.evaluate_template(subpath_tmpl, True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/dbcore/db.py", line 625, in evaluate_template
return template.substitute(self.formatted(for_path=for_path),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/util/functemplate.py", line 576, in substitute
res = self.interpret(values, functions)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/util/functemplate.py", line 568, in interpret
return self.expr.evaluate(Environment(values, functions))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/util/functemplate.py", line 256, in evaluate
out.append(part.evaluate(env))
^^^^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/util/functemplate.py", line 163, in evaluate
if self.ident in env.values:
^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen _collections_abc>", line 780, in __contains__
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 428, in __getitem__
value = self._get(key)
^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 415, in _get
if self.for_path and key in self.album_keys:
^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/util/__init__.py", line 1081, in wrapper
value = func(self)
^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 396, in album_keys
if self.album:
^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 409, in album
return self.item._cached_album
^^^^^^^^^^^^^^^^^^^^^^^
File "/lsiopy/lib/python3.11/site-packages/beets/library.py", line 581, in _cached_album
self.__album.load()
File "/lsiopy/lib/python3.11/site-packages/beets/dbcore/db.py", line 562, in load
assert stored_obj is not None, f"object {self.id} not in DB"
^^^^^^^^^^^^^^^^^^^^^^
AssertionError: object 8 not in DB
My configuration (output of beet config
) is:
plugins: web
directory: /music
library: /config/musiclibrary.blb
art_filename: albumart
threaded: yes
original_date: no
per_disc_numbering: no
convert:
auto: no
ffmpeg: /usr/bin/ffmpeg
opts: -ab 320k -ac 2 -ar 48000
max_bitrate: 320
threads: 1
# don't mind if tracks are missing from an album
match:
distance_weights:
missing_tracks: 0
paths:
default: $albumartist/$album%aunique{}/$track - $title
singleton: Non-Album/$artist - $title
comp: Compilations/$album%aunique{}/$track - $title
albumtype_soundtrack: Soundtracks/$album/$track $title
import:
write: yes
# we copy so pymix can remove the import src dir when the
# process has completed successfully.
copy: yes
move: no
resume: ask
incremental: no
incremental_skip_later: no
# if duplicates found then merge them in. This can happen
# if tracks for a compilation are split over multiple
# directories. Then each time beets tries to import one of
# the directories it will think that it is importing a
# duplicate compilation album. This is acceptable since the
# tracks should be distinct so we just ask beets to merge.
duplicate_action: merge
quiet_fallback: asis
timid: no
log: /config/beet.log
#lastgenre:
# auto: yes
# source: album
embedart:
auto: yes
fetchart:
auto: yes
replaygain:
auto: no
scrub:
auto: yes
replace:
'^\.': _
'[\x00-\x1f]': _
'[<>:"\?\*\|]': _
'[\xE8-\xEB]': e
'[\xEC-\xEF]': i
'[\xE2-\xE6]': a
'[\xF2-\xF6]': o
'[\xF8]': o
'\.$': _
'\s+$': ''
web:
host: 0.0.0.0
port: 8337
Thanks
Luke Purnell
Thanks for the report! This looks bad indeed. It looks tricky to debug, but we should try nonetheless.
The problem here actually seems to come from a performance optimization. To avoid needing to constantly re-fetch the album information associated with every item, we have a cache for the album structure in memory: https://github.com/beetbox/beets/blob/e10b955a931e4c205b0cadf0860797c0aeee736c/beets/library.py#L607-L620
What has happened here is that an item's album has been deleted (presumably as part of the duplicate merge), so the later attempt to refresh the album's metadata on demand fails.
A possible solution here would just be to catch this kind of exception and, when the "refresh" fails, to look it up again from the database. This solution could be problematic if album IDs are reused.
Here is an idea, for anyone who is interested: I would love to know how much performance this is actually saving us. Whenever we do item._cached_album
, one of two things can happen:
https://github.com/beetbox/beets/blob/e10b955a931e4c205b0cadf0860797c0aeee736c/beets/library.py#L616-L619
That is, we either fetch the entire album "from scratch" (self._db.get_album(self)
) or we refresh the existing album (self.__album.load()
). The code clearly intends for the latter to be faster than the former. But is it really? I don't quite see why reloading all the contents would be that much faster than reconstructing the Python object from those contents. Maybe someone would be interested in measuring the real-world performance impact of disabling this optimization (i.e., always using the first route)?
Another possible fix, if we want to keep this optimization, would just be to check whether self.__album.id == self.album_id
. If not, then fall back to the slow path. (This seems like an easy change.)
Thanks for the information here. I re ran the import with the library of music that initially caused this problem on a more powerful machine and have not hit the issue again. So it's possible this issue only presents itself on a resource constrained environment.
The music collection consisted of around 700 tracks. I ran the import with and without threading enabled and with and without the suggested change (only importing the album from scratch with self._db.get_album(self)
). The results were as follows (total time give in bold with units minutes:seconds.milliseconds):
(With threading from scratch) beet import /Users/lukepurnell/rb_export/lajp/subbox_export 26.09s user 10.06s system 4% cpu 12:24.96 total
(With threading no changes) beet import /Users/lukepurnell/rb_export/lajp/subbox_export 21.98s user 9.30s system 4% cpu 12:27.97 total
(Without threading from scratch) beet import /Users/lukepurnell/rb_export/lajp/subbox_export 26.87s user 10.61s system 5% cpu 12:26.61 total
(Without threading no changes) beet import /Users/lukepurnell/rb_export/lajp/subbox_export 28.89s user 10.53s system 5% cpu 12:28.16 total
As you can see, in both threading and no threading cases, fetching the album from scratch regardless is marginally quicker. Also there is only a marginal gain from running with threading.
In terms of progressing this issue. Perhaps we need to do more testing to ensure that fetching from scratch every time is not going to cause a degradation in some other use case. Perhaps you can think of some tests to run to confirm this?
Also, have you considered multi-processing instead of multi-threading for performance? Due to the GIL, this may result in more meaningful performance gains.