beets icon indicating copy to clipboard operation
beets copied to clipboard

Add support for AudioMuse-AI

Open arsaboo opened this issue 2 months ago • 4 comments

Hi team,

I’d like to propose adding native support for AudioMuse-AI as an optional backend for audio attribute extraction in Beets.

Beets users rely on smart playlists, recommendations, and other rule-based automations that depend on rich audio attributes. Today, many of us rely on Spotify’s audio_features for loudness, tempo, energy, valence, danceability, and similar attributes; however, the Spotify API is effectively deprecated for third-party use.

AudioMuse-AI provides a fully local solution that generates a comparable set of audio embeddings and attributes, eliminating the need for external APIs. It naturally fits into Beets’ philosophy of local, scriptable, and privacy-friendly music management.

I have a working integration that calls AudioMuse-AI locally, extracts track-level embeddings and attributes, stores them in Beets fields, and makes them available for queries, smart playlists, and rule-based organization.

Before polishing and submitting a PR, I wanted to check whether we are interested in adding this to the core or if I should keep it as a separate plugin.

cc: @snejus @semohr @JOJ0

arsaboo avatar Nov 08 '25 18:11 arsaboo

Looks like a fun project to integrate. I haven’t dug into the implementation details yet, but I’d argue that storing embeddings directly in Beets’ flex (key/value) fields could become a serious performance bottleneck over time. Even the 8-dimensional Spotify embeddings were already noticeable in my setup, and with higher-dimensional vectors, computing any meaningful statistics or similarity metrics on the fly becomes practically unmanageable.

Given how much more common ML/DL-driven workflows have become recently, it might be worth thinking more broadly about how to integrate such features in a sustainable way. Offloading embeddings into a proper vector database would likely be a better long-term solution. There’s even a neat SQLite extension.

As a sidenote, I have tested a CLAP model locally with a similar approach (writing embeddings to the flex fields). In practice, computing distances like cosine or Hamming similarity was mostly infeasible. The required SQL joins quickly became quite heavy, and performance degraded sharply even for moderately sized libraries. A dedicated vector index would make these kinds of operations far more practical.

semohr avatar Nov 11 '25 10:11 semohr

Honestly, we can skip embeddings for now. Adding a vector database is a significant undertaking, and we should definitely consider it to AI-enable beets (although I am not the right person for that). My initial thought was to use AudioMuse for audio features that were available from Spotify before they deprecated the API.

arsaboo avatar Nov 11 '25 13:11 arsaboo

If it is a straight-up replacment, sure don't see an issue with creating a plugin for it 👍

Have you already started experimenting with integrating it? I’d be interested to see your approach and how you would want to try this 🙃


My two cents after a closer look at their docs and code:

It doesn’t seem like they use embeddings in the conventional sense. Instead, they appear to apply various dimensionality reduction and clustering techniques to generate an embedding space where each track is positioned relative to all others in your library. Because of that, the resulting vector dimensions don’t have interpretable or semantically meaningful labels, they’re just latent coordinates derived from the structure of your music. Which might be fine for some use-cases but it wont replace the echonest labels. It just allows you to group your music into N clusters of similar tracks.

For me, this doesn’t seem particularly useful, since I’m looking for interpretable labels that remain stable as I add new music. With this approach, you’d need to recompute or update all vectors whenever a new track is added, which makes the representation non-static and hard to rely on.


Update:

They use the msd-musicnn models in the analysis pipeline, which allows them to use some embeddings in their clustering algorithm. We might be better of to just use these underlying models if you just want a replacement for spotify.

semohr avatar Nov 11 '25 14:11 semohr

Yes, I already have a working implementation at https://github.com/arsaboo/beets/tree/audiomuse (happy to create a draft PR, if that helps).

Basically, we can get the following values:

Image

audiomuse_embedding can be disabled if we are not sure. However, in terms of resource requirements, I don't see them being significantly worse than lyric, which we are already allowing. Honestly, we don't have to use the embeddings, as we can query the audiomuse api to find similar tracks (this is also already implemented).

arsaboo avatar Nov 11 '25 17:11 arsaboo