Add Apple-Music-Scraper Python Script

Open Abssdghi opened this issue 1 month ago • 1 comments

Description

This PR adds a brand-new Apple Music Web Scraper capable of scraping:

Songs
Albums
Playlists
Artists
Music videos
Rooms
Full search results

It parses Apple Music’s internal serialized-server-data JSON structure and converts it into a clean Python output.
This feature did NOT exist in the repository before and expands the Scrapping/Social Media category significantly.

What’s Included:

apple_music_scraper.py – Main scraper logic
utils.py – Helper methods (cover resolver, URL converter, etc.)
README.md – Full documentation + examples
requirements.txt – clean dependency list (requests, beautifulsoup4)

Fixes #none

No existing issue was referenced; this is a brand-new standalone feature.

Type of change

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update
[ ] Documentation Update

Checklist:

[x] My code follows the style guidelines(Clean Code) of this project
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have created a helpful and easy to understand README.md
[x] My documentation follows Template for README.md
[x] I have added the project meta data in the PR template.
[x] I have created the requirements.txt file if needed.

Project Metadata

Category:

[ ] Calculators
[ ] AI/ML
[x] Scrappers
[x] Social_Media
[ ] PDF
[ ] Image_Processing
[ ] Video_Processing
[ ] Games
[ ] Networking
[ ] OS_Utilities
[ ] Automation
[ ] Cryptography
[ ] Computer_Vision
[ ] Fun
[ ] Others

Title: Apple Music Web Scraper

Folder: Apple-Music-Scraper

Requirements: requirements.txt

Script: apple_music_scraper.py

Arguments: none

Contributor: abssdghi

Description:
A powerful and fully-featured Apple Music scraper that extracts songs, albums, playlists, videos, artist pages, and full search results using Apple Music’s internal structured JSON data.

Summary by Sourcery

Add a new Apple Music web scraper module that extracts structured metadata from various Apple Music web pages using their embedded serialized JSON data.

New Features:

Provide scraping functions for Apple Music songs, albums, playlists, artists, music videos, rooms, and search results, returning structured Python dictionaries and URL lists.
Add utility helpers for generating full artwork URLs, converting album track URLs to direct song URLs, and collecting all singles and EPs for an artist.
Document the Apple Music scraper usage, capabilities, and example workflows in a dedicated README for the new module.
Declare bs4 and requests as dependencies for the Apple Music scraping functionality in a requirements file.

Nov 24 '25 11:11 Abssdghi

Reviewer's Guide

Adds a new Apple Music web scraping module that parses Apple Music’s serialized-server-data and JSON-LD blocks to provide structured song, album, playlist, artist, video, room, and search results, supported by shared utilities for artwork URL formatting, URL conversion, and fetching singles/EPs, along with documentation and dependencies.

Sequence diagram for the Apple Music search-to-latest-song workflow

sequenceDiagram
    actor "User" as User
    participant "Client Script" as Client
    participant "main.search()" as Search
    participant "main.artist_scrape()" as ArtistScrape
    participant "main.album_scrape()" as AlbumScrape
    participant "utils.get_cover()" as GetCover
    participant "utils.get_all_singles()" as GetAllSingles
    participant "Apple Music Web" as AppleWeb

    "User" ->> "Client Script": "Call search('night tapes')"
    "Client Script" ->> "main.search()": "search(keyword)"
    "main.search()" ->> "Apple Music Web": "GET https://music.apple.com/us/search?term=keyword"
    "Apple Music Web" -->> "main.search()": "HTML with 'serialized-server-data' script"
    "main.search()" ->> "main.search()": "Parse HTML with BeautifulSoup"
    "main.search()" ->> "main.search()": "json.loads(serialized-server-data)"
    "main.search()" ->> "utils.get_cover()": "Build artwork URL for each result"
    "utils.get_cover()" -->> "main.search()": "Formatted artwork URL"
    "main.search()" -->> "Client Script": "Structured search results dict"

    "Client Script" ->> "main.artist_scrape()": "artist_scrape(artist_url)"
    "main.artist_scrape()" ->> "Apple Music Web": "GET artist page HTML"
    "Apple Music Web" -->> "main.artist_scrape()": "HTML with 'serialized-server-data'"
    "main.artist_scrape()" ->> "main.artist_scrape()": "Parse and extract sections (detail, latest, top, etc.)"
    "main.artist_scrape()" ->> "utils.get_cover()": "Build artist artwork URL"
    "utils.get_cover()" -->> "main.artist_scrape()": "Formatted artwork URL"
    "main.artist_scrape()" ->> "utils.get_all_singles()": "get_all_singles(artist_url)"
    "utils.get_all_singles()" ->> "Apple Music Web": "GET artist/see-all?section=singles"
    "Apple Music Web" -->> "utils.get_all_singles()": "HTML with singles section"
    "utils.get_all_singles()" ->> "utils.get_all_singles()": "Parse serialized-server-data and items"
    "utils.get_all_singles()" -->> "main.artist_scrape()": "List of singles and EP URLs"
    "main.artist_scrape()" -->> "Client Script": "Artist metadata dict (including 'latest' URL)"

    "Client Script" ->> "main.album_scrape()": "album_scrape(latest_song_album_url)"
    "main.album_scrape()" ->> "Apple Music Web": "GET album page HTML"
    "Apple Music Web" -->> "main.album_scrape()": "HTML with 'serialized-server-data'"
    "main.album_scrape()" ->> "main.album_scrape()": "Parse sections (album-detail, track-list, etc.)"
    "main.album_scrape()" ->> "utils.get_cover()": "Build album artwork URL"
    "utils.get_cover()" -->> "main.album_scrape()": "Formatted artwork URL"
    "main.album_scrape()" -->> "Client Script": "Album metadata dict (title, image, songs, more, similar)"

    "Client Script" -->> "User": "Display latest song title and cover art"

Class diagram for the new Apple Music scraper and utilities

classDiagram
    class MainScraper {
        +room_scrape(link="https://music.apple.com/us/room/6748797380") list~str~
        +playlist_scrape(link="https://music.apple.com/us/playlist/new-music-daily/pl.2b0e6e332fdf4b7a91164da3162127b5") list~str~
        +search(keyword="sasha sloan") dict
        +song_scrape(url="https://music.apple.com/us/song/california/1821538031") dict
        +album_scrape(url="https://music.apple.com/us/album/1965/1817707266?i=1817707585") dict
        +video_scrape(url="https://music.apple.com/us/music-video/gucci-mane-visualizer/1810547026") dict
        +artist_scrape(url="https://music.apple.com/us/artist/king-princess/1349968534") dict
    }

    class Utils {
        +get_cover(url, width, height, format="jpg", crop_option="") str
        +convert_album_to_song_url(album_url) str
        +get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534") list~str~
    }

    MainScraper ..> Utils : "uses 'get_cover' for artwork URLs"
    MainScraper ..> Utils : "uses 'convert_album_to_song_url' in room_scrape, playlist_scrape, album_scrape"
    MainScraper ..> Utils : "uses 'get_all_singles' inside artist_scrape"

Flow diagram for generic Apple Music page scraping using serialized-server-data

flowchart TD
    A["Start scraping function (song_scrape, album_scrape, video_scrape, artist_scrape, room_scrape, playlist_scrape, search"] --> B["Build target Apple Music URL (page-specific)"]
    B["Build target Apple Music URL (page-specific)"] --> C["Set headers with 'User-Agent: Mozilla/5.0'"]
    C["Set headers with 'User-Agent: Mozilla/5.0'"] --> D["requests.get(URL, headers=headers)"]
    D["requests.get(URL, headers=headers)"] --> E["Parse HTML with BeautifulSoup"]
    E["Parse HTML with BeautifulSoup"] --> F{"Find script tag with id 'serialized-server-data'?"}
    F{"Find script tag with id 'serialized-server-data'?"} -->|"Yes"| G["Extract script text and load JSON via json.loads"]
    F{"Find script tag with id 'serialized-server-data'?"} -->|"No"| Z["Return empty or partial result (error or structure change)"]
    G["Extract script text and load JSON via json.loads"] --> H["Access our_json[0]['data']['sections']"]
    H["Access our_json[0]['data']['sections']"] --> I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"}
    I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"} --> J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"]
    J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"] --> K{"Artwork present in item?"}
    K{"Artwork present in item?"} -->|"Yes"| L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"]
    K{"Artwork present in item?"} -->|"No"| M["Set artwork field to empty string"]
    L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"] --> N["Attach formatted artwork URL to result object"]
    M["Set artwork field to empty string"] --> N["Attach formatted artwork URL to result object"]
    N["Attach formatted artwork URL to result object"] --> O{"Needs additional JSON-LD (preview or video URL)?"}
    O{"Needs additional JSON-LD (preview or video URL)?"} -->|"Yes"| P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"]
    O{"Needs additional JSON-LD (preview or video URL)?"} -->|"No"| R["Skip JSON-LD step"]
    P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"] --> Q["Extract preview or video content URL and add to result"]
    Q["Extract preview or video content URL and add to result"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
    R["Skip JSON-LD step"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
    S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"] --> T["Return JSON-like Python structure to caller"]
    Z["Return empty or partial result (error or structure change)"] --> T["Return JSON-like Python structure to caller"]

File-Level Changes

Change	Details	Files
Introduce a main Apple Music scraping module that exposes high-level scraping functions for different Apple Music entities.	Implement room_scrape and playlist_scrape to extract track URLs from room and playlist pages by parsing serialized-server-data sections and converting album-track URLs to song URLs. Implement search to query Apple Music’s search endpoint, parse sectioned results (artists, albums, songs, playlists, videos), and normalize them into structured dictionaries including optional artwork URLs. Implement song_scrape to extract detailed song metadata (title, artwork, album/artist info, preview URL, and related songs) using serialized-server-data and schema:song JSON-LD. Implement album_scrape to collect album metadata, track song URLs, description, artist info, related albums, videos, and “more by artist” sections using multiple identified sections within serialized-server-data. Implement video_scrape to fetch music-video metadata, artwork, artist info, direct video URL, and related content via serialized-server-data sections and schema:music-video JSON-LD. Implement artist_scrape to aggregate rich artist data including latest release, top songs, albums, singles/EPs, playlists, videos, similar artists, appearances, and bio fields from multiple serialized-server-data sections, delegating singles/EP retrieval to a helper.	`Apple-Music-Scraper/main.py`
Add shared utilities to support artwork URL resolution, URL normalization, and singles retrieval for artists.	Implement get_cover to transform Apple Music artwork template URLs by replacing width, height, format, and crop placeholders with concrete values. Implement convert_album_to_song_url to derive canonical song URLs from album track URLs by reading the i query parameter and reconstructing the path as a /song/ URL. Implement get_all_singles to fetch and parse the artist’s singles section via the /see-all?section=singles endpoint and return all single/EP URLs.	`Apple-Music-Scraper/utils.py`
Document the new Apple Music scraper module and declare its external dependencies.	Create README describing scraper purpose, capabilities, setup, and usage example, including explanation of serialized-server-data parsing and JSON-shaped outputs. Add requirements.txt listing bs4 and requests as the only dependencies for the scraper.	`Apple-Music-Scraper/README.md` `Apple-Music-Scraper/requirements.txt`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an issue from a review comment by replying to it. You can also reply to a review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull request title to generate a title at any time. You can also comment @sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in the pull request body to generate a PR summary at any time exactly where you want it. You can also comment @sourcery-ai summary on the pull request to (re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the pull request to resolve all Sourcery comments. Useful if you've already addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull request to dismiss all existing Sourcery reviews. Especially useful if you want to start fresh with a new review - don't forget to comment @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

Nov 24 '25 11:11 sourcery-ai[bot]