Add Apple-Music-Scraper Python Script
Description
This PR adds a brand-new Apple Music Web Scraper capable of scraping:
- Songs
- Albums
- Playlists
- Artists
- Music videos
- Rooms
- Full search results
It parses Apple Music’s internal serialized-server-data JSON structure and converts it into a clean Python output.
This feature did NOT exist in the repository before and expands the Scrapping/Social Media category significantly.
What’s Included:
apple_music_scraper.py– Main scraper logicutils.py– Helper methods (cover resolver, URL converter, etc.)README.md– Full documentation + examplesrequirements.txt– clean dependency list (requests,beautifulsoup4)
Fixes #none
No existing issue was referenced; this is a brand-new standalone feature.
Type of change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
- [ ] Documentation Update
Checklist:
- [x] My code follows the style guidelines(Clean Code) of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have created a helpful and easy to understand
README.md - [x] My documentation follows
Template for README.md - [x] I have added the project meta data in the PR template.
- [x] I have created the
requirements.txtfile if needed.
Project Metadata
Category:
- [ ] Calculators
- [ ] AI/ML
- [x] Scrappers
- [x] Social_Media
- [ ] Image_Processing
- [ ] Video_Processing
- [ ] Games
- [ ] Networking
- [ ] OS_Utilities
- [ ] Automation
- [ ] Cryptography
- [ ] Computer_Vision
- [ ] Fun
- [ ] Others
Title: Apple Music Web Scraper
Folder: Apple-Music-Scraper
Requirements: requirements.txt
Script: apple_music_scraper.py
Arguments: none
Contributor: abssdghi
Description:
A powerful and fully-featured Apple Music scraper that extracts songs, albums, playlists, videos, artist pages, and full search results using Apple Music’s internal structured JSON data.
Summary by Sourcery
Add a new Apple Music web scraper module that extracts structured metadata from various Apple Music web pages using their embedded serialized JSON data.
New Features:
- Provide scraping functions for Apple Music songs, albums, playlists, artists, music videos, rooms, and search results, returning structured Python dictionaries and URL lists.
- Add utility helpers for generating full artwork URLs, converting album track URLs to direct song URLs, and collecting all singles and EPs for an artist.
- Document the Apple Music scraper usage, capabilities, and example workflows in a dedicated README for the new module.
- Declare bs4 and requests as dependencies for the Apple Music scraping functionality in a requirements file.
Reviewer's Guide
Adds a new Apple Music web scraping module that parses Apple Music’s serialized-server-data and JSON-LD blocks to provide structured song, album, playlist, artist, video, room, and search results, supported by shared utilities for artwork URL formatting, URL conversion, and fetching singles/EPs, along with documentation and dependencies.
Sequence diagram for the Apple Music search-to-latest-song workflow
sequenceDiagram
actor "User" as User
participant "Client Script" as Client
participant "main.search()" as Search
participant "main.artist_scrape()" as ArtistScrape
participant "main.album_scrape()" as AlbumScrape
participant "utils.get_cover()" as GetCover
participant "utils.get_all_singles()" as GetAllSingles
participant "Apple Music Web" as AppleWeb
"User" ->> "Client Script": "Call search('night tapes')"
"Client Script" ->> "main.search()": "search(keyword)"
"main.search()" ->> "Apple Music Web": "GET https://music.apple.com/us/search?term=keyword"
"Apple Music Web" -->> "main.search()": "HTML with 'serialized-server-data' script"
"main.search()" ->> "main.search()": "Parse HTML with BeautifulSoup"
"main.search()" ->> "main.search()": "json.loads(serialized-server-data)"
"main.search()" ->> "utils.get_cover()": "Build artwork URL for each result"
"utils.get_cover()" -->> "main.search()": "Formatted artwork URL"
"main.search()" -->> "Client Script": "Structured search results dict"
"Client Script" ->> "main.artist_scrape()": "artist_scrape(artist_url)"
"main.artist_scrape()" ->> "Apple Music Web": "GET artist page HTML"
"Apple Music Web" -->> "main.artist_scrape()": "HTML with 'serialized-server-data'"
"main.artist_scrape()" ->> "main.artist_scrape()": "Parse and extract sections (detail, latest, top, etc.)"
"main.artist_scrape()" ->> "utils.get_cover()": "Build artist artwork URL"
"utils.get_cover()" -->> "main.artist_scrape()": "Formatted artwork URL"
"main.artist_scrape()" ->> "utils.get_all_singles()": "get_all_singles(artist_url)"
"utils.get_all_singles()" ->> "Apple Music Web": "GET artist/see-all?section=singles"
"Apple Music Web" -->> "utils.get_all_singles()": "HTML with singles section"
"utils.get_all_singles()" ->> "utils.get_all_singles()": "Parse serialized-server-data and items"
"utils.get_all_singles()" -->> "main.artist_scrape()": "List of singles and EP URLs"
"main.artist_scrape()" -->> "Client Script": "Artist metadata dict (including 'latest' URL)"
"Client Script" ->> "main.album_scrape()": "album_scrape(latest_song_album_url)"
"main.album_scrape()" ->> "Apple Music Web": "GET album page HTML"
"Apple Music Web" -->> "main.album_scrape()": "HTML with 'serialized-server-data'"
"main.album_scrape()" ->> "main.album_scrape()": "Parse sections (album-detail, track-list, etc.)"
"main.album_scrape()" ->> "utils.get_cover()": "Build album artwork URL"
"utils.get_cover()" -->> "main.album_scrape()": "Formatted artwork URL"
"main.album_scrape()" -->> "Client Script": "Album metadata dict (title, image, songs, more, similar)"
"Client Script" -->> "User": "Display latest song title and cover art"
Class diagram for the new Apple Music scraper and utilities
classDiagram
class MainScraper {
+room_scrape(link="https://music.apple.com/us/room/6748797380") list~str~
+playlist_scrape(link="https://music.apple.com/us/playlist/new-music-daily/pl.2b0e6e332fdf4b7a91164da3162127b5") list~str~
+search(keyword="sasha sloan") dict
+song_scrape(url="https://music.apple.com/us/song/california/1821538031") dict
+album_scrape(url="https://music.apple.com/us/album/1965/1817707266?i=1817707585") dict
+video_scrape(url="https://music.apple.com/us/music-video/gucci-mane-visualizer/1810547026") dict
+artist_scrape(url="https://music.apple.com/us/artist/king-princess/1349968534") dict
}
class Utils {
+get_cover(url, width, height, format="jpg", crop_option="") str
+convert_album_to_song_url(album_url) str
+get_all_singles(url="https://music.apple.com/us/artist/king-princess/1349968534") list~str~
}
MainScraper ..> Utils : "uses 'get_cover' for artwork URLs"
MainScraper ..> Utils : "uses 'convert_album_to_song_url' in room_scrape, playlist_scrape, album_scrape"
MainScraper ..> Utils : "uses 'get_all_singles' inside artist_scrape"
Flow diagram for generic Apple Music page scraping using serialized-server-data
flowchart TD
A["Start scraping function (song_scrape, album_scrape, video_scrape, artist_scrape, room_scrape, playlist_scrape, search"] --> B["Build target Apple Music URL (page-specific)"]
B["Build target Apple Music URL (page-specific)"] --> C["Set headers with 'User-Agent: Mozilla/5.0'"]
C["Set headers with 'User-Agent: Mozilla/5.0'"] --> D["requests.get(URL, headers=headers)"]
D["requests.get(URL, headers=headers)"] --> E["Parse HTML with BeautifulSoup"]
E["Parse HTML with BeautifulSoup"] --> F{"Find script tag with id 'serialized-server-data'?"}
F{"Find script tag with id 'serialized-server-data'?"} -->|"Yes"| G["Extract script text and load JSON via json.loads"]
F{"Find script tag with id 'serialized-server-data'?"} -->|"No"| Z["Return empty or partial result (error or structure change)"]
G["Extract script text and load JSON via json.loads"] --> H["Access our_json[0]['data']['sections']"]
H["Access our_json[0]['data']['sections']"] --> I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"}
I{"Select relevant sections by 'id' pattern (e.g., 'track-list', 'artist-detail', 'music-video-header')"} --> J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"]
J["Iterate over 'items' collections to gather URLs, titles, subtitles, descriptors"] --> K{"Artwork present in item?"}
K{"Artwork present in item?"} -->|"Yes"| L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"]
K{"Artwork present in item?"} -->|"No"| M["Set artwork field to empty string"]
L["Call utils.get_cover() to expand artwork URL with width, height, format, crop"] --> N["Attach formatted artwork URL to result object"]
M["Set artwork field to empty string"] --> N["Attach formatted artwork URL to result object"]
N["Attach formatted artwork URL to result object"] --> O{"Needs additional JSON-LD (preview or video URL)?"}
O{"Needs additional JSON-LD (preview or video URL)?"} -->|"Yes"| P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"]
O{"Needs additional JSON-LD (preview or video URL)?"} -->|"No"| R["Skip JSON-LD step"]
P["Find JSON-LD script (e.g., id 'schema:song' or 'schema:music-video') and json.loads"] --> Q["Extract preview or video content URL and add to result"]
Q["Extract preview or video content URL and add to result"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
R["Skip JSON-LD step"] --> S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"]
S["Assemble final structured dict or list (songs, albums, artists, videos, rooms, playlists)"] --> T["Return JSON-like Python structure to caller"]
Z["Return empty or partial result (error or structure change)"] --> T["Return JSON-like Python structure to caller"]
File-Level Changes
| Change | Details | Files |
|---|---|---|
| Introduce a main Apple Music scraping module that exposes high-level scraping functions for different Apple Music entities. |
|
Apple-Music-Scraper/main.py |
| Add shared utilities to support artwork URL resolution, URL normalization, and singles retrieval for artists. |
|
Apple-Music-Scraper/utils.py |
| Document the new Apple Music scraper module and declare its external dependencies. |
|
Apple-Music-Scraper/README.mdApple-Music-Scraper/requirements.txt |
Tips and commands
Interacting with Sourcery
- Trigger a new review: Comment
@sourcery-ai reviewon the pull request. - Continue discussions: Reply directly to Sourcery's review comments.
- Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with
@sourcery-ai issueto create an issue from it. - Generate a pull request title: Write
@sourcery-aianywhere in the pull request title to generate a title at any time. You can also comment@sourcery-ai titleon the pull request to (re-)generate the title at any time. - Generate a pull request summary: Write
@sourcery-ai summaryanywhere in the pull request body to generate a PR summary at any time exactly where you want it. You can also comment@sourcery-ai summaryon the pull request to (re-)generate the summary at any time. - Generate reviewer's guide: Comment
@sourcery-ai guideon the pull request to (re-)generate the reviewer's guide at any time. - Resolve all Sourcery comments: Comment
@sourcery-ai resolveon the pull request to resolve all Sourcery comments. Useful if you've already addressed all the comments and don't want to see them anymore. - Dismiss all Sourcery reviews: Comment
@sourcery-ai dismisson the pull request to dismiss all existing Sourcery reviews. Especially useful if you want to start fresh with a new review - don't forget to comment@sourcery-ai reviewto trigger a new review!
Customizing Your Experience
Access your dashboard to:
- Enable or disable review features such as the Sourcery-generated pull request summary, the reviewer's guide, and others.
- Change the review language.
- Add, remove or edit custom review instructions.
- Adjust other review settings.
Getting Help
- Contact our support team for questions or feedback.
- Visit our documentation for detailed guides and information.
- Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.