Amazing-Python-Scripts icon indicating copy to clipboard operation
Amazing-Python-Scripts copied to clipboard

Add Apple-Music-Scraper Python Script

Open Abssdghi opened this issue 1 month ago • 1 comments

Description

This PR adds a brand-new Apple Music Web Scraper capable of scraping:

  • Songs
  • Albums
  • Playlists
  • Artists
  • Music videos
  • Rooms
  • Full search results

It parses Apple Music’s internal serialized-server-data JSON structure and converts it into a clean Python output.
This feature did NOT exist in the repository before and expands the Scrapping/Social Media category significantly.

What’s Included:

  • apple_music_scraper.py – Main scraper logic
  • utils.py – Helper methods (cover resolver, URL converter, etc.)
  • README.md – Full documentation + examples
  • requirements.txt – clean dependency list (requests, beautifulsoup4)

Fixes #none

No existing issue was referenced; this is a brand-new standalone feature.

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] This change requires a documentation update
  • [ ] Documentation Update

Checklist:

  • [x] My code follows the style guidelines(Clean Code) of this project
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have created a helpful and easy to understand README.md
  • [x] My documentation follows Template for README.md
  • [x] I have added the project meta data in the PR template.
  • [x] I have created the requirements.txt file if needed.

Project Metadata

Category:

  • [ ] Calculators
  • [ ] AI/ML
  • [x] Scrappers
  • [x] Social_Media
  • [ ] PDF
  • [ ] Image_Processing
  • [ ] Video_Processing
  • [ ] Games
  • [ ] Networking
  • [ ] OS_Utilities
  • [ ] Automation
  • [ ] Cryptography
  • [ ] Computer_Vision
  • [ ] Fun
  • [ ] Others

Title: Apple Music Web Scraper

Folder: Apple-Music-Scraper

Requirements: requirements.txt

Script: apple_music_scraper.py

Arguments: none

Contributor: abssdghi

Description:
A powerful and fully-featured Apple Music scraper that extracts songs, albums, playlists, videos, artist pages, and full search results using Apple Music’s internal structured JSON data.

Summary by Sourcery

Add a new Apple Music web scraper module that extracts structured metadata from various Apple Music web pages using their embedded serialized JSON data.

New Features:

  • Provide scraping functions for Apple Music songs, albums, playlists, artists, music videos, rooms, and search results, returning structured Python dictionaries and URL lists.
  • Add utility helpers for generating full artwork URLs, converting album track URLs to direct song URLs, and collecting all singles and EPs for an artist.
  • Document the Apple Music scraper usage, capabilities, and example workflows in a dedicated README for the new module.
  • Declare bs4 and requests as dependencies for the Apple Music scraping functionality in a requirements file.

Abssdghi avatar Nov 24 '25 11:11 Abssdghi

Reviewer's Guide

Adds a new Apple Music web scraping module that parses Apple Music’s serialized-server-data and JSON-LD blocks to provide structured song, album, playlist, artist, video, room, and search results, supported by shared utilities for artwork URL formatting, URL conversion, and fetching singles/EPs, along with documentation and dependencies.

Sequence diagram for the Apple Music search-to-latest-song workflow

sequenceDiagram
    actor "User" as User
    participant "Client Script" as Client
    participant "main.search()" as Search
    participant "main.artist_scrape()" as ArtistScrape
    participant "main.album_scrape()" as AlbumScrape
    participant "utils.get_cover()" as GetCover
    participant "utils.get_all_singles()" as GetAllSingles
    participant "Apple Music Web" as AppleWeb

    "User" ->> "Client Script": "Call search('night tapes')"
    "Client Script" ->> "main.search()": "search(keyword)"
    "main.search()" ->> "Apple Music Web": "GET https://music.apple.com/us/search?term=keyword"
    "Apple Music Web" -->> "main.search()": "HTML with 'serialized-server-data' script"
    "main.search()" ->> "main.search()": "Parse HTML with BeautifulSoup"
    "main.search()" ->> "main.search()": "json.loads(serialized-server-data)"
    "main.search()" ->> "utils.get_cover()": "Build artwork URL for each result"
    "utils.get_cover()" -->> "main.search()": "Formatted artwork URL"
    "main.search()" -->> "Client Script": "Structured search results dict"

    "Client Script" ->> "main.artist_scrape()": "artist_scrape(artist_url)"
    "main.artist_scrape()" ->> "Apple Music Web": "GET artist page HTML"
    "Apple Music Web" -->> "main.artist_scrape()": "HTML with 'serialized-server-data'"
    "main.artist_scrape()" ->> "main.artist_scrape()": "Parse and extract sections (detail, latest, top, etc.)"
    "main.artist_scrape()" ->> "utils.get_cover()": "Build artist artwork URL"
    "utils.get_cover()" -->> "main.artist_scrape()": "Formatted artwork URL"
    "main.artist_scrape()" ->> "utils.get_all_singles()": "get_all_singles(artist_url)"
    "utils.get_all_singles()" ->> "Apple Music Web": "GET artist/see-all?section=singles"
    "Apple Music Web" -->> "utils.get_all_singles()": "HTML with singles section"
    "utils.get_all_singles()" ->> "utils.get_all_singles()": "Parse serialized-server-data and items"
    "utils.get_all_singles()" -->> "main.artist_scrape()": "List of singles and EP URLs"
    "main.artist_scrape()" -->> "Client Script": "Artist metadata dict (including 'latest' URL)"

    "Client Script" ->> "main.album_scrape()": "album_scrape(latest_song_album_url)"
    "main.album_scrape()" ->> "Apple Music Web": "GET album page HTML"
    "Apple Music Web" -->> "main.album_scrape()": "HTML with 'serialized-server-data'"
    "main.album_scrape()" ->> "main.album_scrape()": "Parse sections (album-detail, track-list, etc.)"
    "main.album_scrape()" ->> "utils.get_cover()": "Build album artwork URL"
    "utils.get_cover()" -->> "main.album_scrape()": "Formatted artwork URL"
    "main.album_scrape()" -->> "Client Script": "Album metadata dict (title, image, songs, more, similar)"

    "Client Script" -->> "User": "Display latest song title and cover art"

Class diagram for the new Apple Music scraper and utilities

classDiagram
    class MainScraper {
        +room_scrape(link="https://music.apple.com/us/room/6748797380") list~str~
        +playlist_scrape(link="https://music.apple.com/us/playlist/new-music-daily/pl.2b0e6e332fdf4b7a91164da3162127b5") list~str~
        +search(keyword="sasha sloan") dict
        +song_scrape(url="https://music.apple.com/us/song/california/1821538031") dict
        +album_scrape(url="https://music.apple.com/us/album/1965/1817707266?i=1817707585") dict
        +video_scrape(url="https://music.apple.com/us/music-video/gucci-mane-visualizer/1810547026") dict
        +artist_scrape(url="https://music.apple.com/us/artist/king-princess/1349968534") dict
    }

    class Utils {
        +get_cover(url, width, height, format="jpg", crop_option="") str
        +convert_album_to_song_url(album_url) str
        +get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534") list~str~
    }

    MainScraper ..> Utils : "uses 'get_cover' for artwork URLs"
    MainScraper ..> Utils : "uses 'convert_album_to_song_url' in room_scrape, playlist_scrape, album_scrape"
    MainScraper ..> Utils : "uses 'get_all_singles' inside artist_scrape"

Flow diagram for generic Apple Music page scraping using serialized-server-data

flowchart TD
    A["Start scraping function (song_scrape, album_scrape, video_scrape, artist_scrape, room_scrape, playlist_scrape, search"] --> B["Build target Apple Music URL (page-specific)"]
    B["Build target Apple Music URL (page-specific)"] --> C["Set headers with 'User-Agent: Mozilla/5.0'"]
    C["Set headers with 'User-Agent: Mozilla/5.0'"] --> D["requests.get(URL, headers=headers)"]
    D["requests.get(URL, headers=headers)"] --> E["Parse HTML with BeautifulSoup"]
    E["Parse HTML with BeautifulSoup"] --> F{"Find script tag with id 'serialized-server-data'?"}
    F{"Find script tag with id 'serialized-server-data'?"} -->|"Yes"| G["Extract script text and load JSON via json.loads"]
    F{"Find script tag with id 'serialized-server-data'?"} -->|"No"| Z["Return empty or partial result (error or structure change)"]
    G["Extract script text and load JSON via json.loads"] --> H["Access our_json[0]['data']['sections']"]
    H["Access our_json[0]['data']['sections']"] --> I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"}
    I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"} --> J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"]
    J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"] --> K{"Artwork present in item?"}
    K{"Artwork present in item?"} -->|"Yes"| L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"]
    K{"Artwork present in item?"} -->|"No"| M["Set artwork field to empty string"]
    L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"] --> N["Attach formatted artwork URL to result object"]
    M["Set artwork field to empty string"] --> N["Attach formatted artwork URL to result object"]
    N["Attach formatted artwork URL to result object"] --> O{"Needs additional JSON-LD (preview or video URL)?"}
    O{"Needs additional JSON-LD (preview or video URL)?"} -->|"Yes"| P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"]
    O{"Needs additional JSON-LD (preview or video URL)?"} -->|"No"| R["Skip JSON-LD step"]
    P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"] --> Q["Extract preview or video content URL and add to result"]
    Q["Extract preview or video content URL and add to result"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
    R["Skip JSON-LD step"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
    S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"] --> T["Return JSON-like Python structure to caller"]
    Z["Return empty or partial result (error or structure change)"] --> T["Return JSON-like Python structure to caller"]

File-Level Changes

Change Details Files
Introduce a main Apple Music scraping module that exposes high-level scraping functions for different Apple Music entities.
  • Implement room_scrape and playlist_scrape to extract track URLs from room and playlist pages by parsing serialized-server-data sections and converting album-track URLs to song URLs.
  • Implement search to query Apple Music’s search endpoint, parse sectioned results (artists, albums, songs, playlists, videos), and normalize them into structured dictionaries including optional artwork URLs.
  • Implement song_scrape to extract detailed song metadata (title, artwork, album/artist info, preview URL, and related songs) using serialized-server-data and schema:song JSON-LD.
  • Implement album_scrape to collect album metadata, track song URLs, description, artist info, related albums, videos, and “more by artist” sections using multiple identified sections within serialized-server-data.
  • Implement video_scrape to fetch music-video metadata, artwork, artist info, direct video URL, and related content via serialized-server-data sections and schema:music-video JSON-LD.
  • Implement artist_scrape to aggregate rich artist data including latest release, top songs, albums, singles/EPs, playlists, videos, similar artists, appearances, and bio fields from multiple serialized-server-data sections, delegating singles/EP retrieval to a helper.
Apple-Music-Scraper/main.py
Add shared utilities to support artwork URL resolution, URL normalization, and singles retrieval for artists.
  • Implement get_cover to transform Apple Music artwork template URLs by replacing width, height, format, and crop placeholders with concrete values.
  • Implement convert_album_to_song_url to derive canonical song URLs from album track URLs by reading the i query parameter and reconstructing the path as a /song/ URL.
  • Implement get_all_singles to fetch and parse the artist’s singles section via the /see-all?section=singles endpoint and return all single/EP URLs.
Apple-Music-Scraper/utils.py
Document the new Apple Music scraper module and declare its external dependencies.
  • Create README describing scraper purpose, capabilities, setup, and usage example, including explanation of serialized-server-data parsing and JSON-shaped outputs.
  • Add requirements.txt listing bs4 and requests as the only dependencies for the scraper.
Apple-Music-Scraper/README.md
Apple-Music-Scraper/requirements.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an issue from a review comment by replying to it. You can also reply to a review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull request title to generate a title at any time. You can also comment @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in the pull request body to generate a PR summary at any time exactly where you want it. You can also comment @sourcery-ai summary on the pull request to (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the pull request to resolve all Sourcery comments. Useful if you've already addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull request to dismiss all existing Sourcery reviews. Especially useful if you want to start fresh with a new review - don't forget to comment @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

  • Contact our support team for questions or feedback.
  • Visit our documentation for detailed guides and information.
  • Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai[bot] avatar Nov 24 '25 11:11 sourcery-ai[bot]