abx-dl
abx-dl copied to clipboard
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/s...
⬇️ abx-dl [VAPORWARE] (please make this!)
A simple all-in-one CLI tool to auto-detect and download everything available from a URL.
pip install abx-dl
abx-dl 'https://example.com/page/to/download'
[!IMPORTANT]
❈ NOT IMPLEMENTED YET Coming someday... read the Plugin Ecosystem Announcement (2024-10)
Release ETA: afterarchiveboxv0.9.0You should make this! Use https://deepwiki.com/archivebox/abx-pkg to set up the dependencies like yt-dlp, ffmpeg, chrome, etc. + a single global event queue and single worker process/actor for each.
✨ Ever wish you could yt-dlp, gallery-dl, wget, curl, puppeteer, etc. all in one command?
abx-dl is an all-in-one CLI tool for downloading URLs "by any means necessary".
It's useful for scraping, downloading, OSINT, digital preservation, and more.
abx-dl is built to provide a simpler one-shot CLI interface to the ArchiveBox archiving engine (it replaces the old archivebox oneshot command).
🍜 What does it save?
abx-dl --extract=title,favicon,headers,wget,media,singlefile,screenshot,pdf,dom,readability,git,... 'https://example.com'`
abx-dl gets everything by default, or you can tell it to --extract=... specific methods:
- HTML, JS, CSS, images, etc. rendered with a headless browser
- title, favicon, headers, outlinks, and other metadata
- audio, video, subtitles, playlists, comments
- snapshot of the page as a PDF, screenshot, and Singlefile HTML
- article text,
gitsource code - and much more...
🧩 How does it work?
Forget about writing janky manual crawling scripts with JS/Python/playwright/puppeteer/bash.
abx-dl renders all URLs passed in a fully-featured modern browser using puppeteer.
It auto-detects a wide variety of embedded resources using plugins, and extracts discovered content out to raw files (mp4, png, txt, pdf, html, etc.) in the current working directory.
abx-dlcollects all of your favorite powerful scraping and downloading tools, including:wget,wget-lua,curl,puppeteer,playwright,singlefile,readability,yt-dlp,forum-dl, and many more through the ABX Plugin Library (shared with ArchiveBox)...
You no longer have to deal with installing and configuring a bunch of tools individually.
⚙️ What options does it provide?
Pass --extract=<methods> to get only what you need, and set other config via env vars / args:
USER_AGENT,CHECK_SSL_VALIDITY,CHROME_USER_DATA_DIR/COOKIES_TXTTIMEOUT=60,MAX_MEDIA_SIZE=750m,RESOLUTION=1440,2000,ONLY_NEW=True- and more here...
Configuration options apply seamlessly across all methods.
📦 ~~Install~~ Coming Soon...
pip install abx-dl
abx-dl install # optional: install any system packages needed
🔠 Usage
# Basic usage:
abx-dl [--help|--version] [--config|-c] [--extract=methods] [url]
Download everything
abx-dl 'https://example.com'
ls ./
# <see All Outputs below>
Download just title + screenshot
abx-dl --extract=title,screenshot 'https://example.com'
ls ./
# index.json title.txt screenshot.png
Download title + screenshot + html + media
abx-dl --extract=title,favicon,screenshot,singlefile,media 'https://example.com'
ls ./
# index.json index.html title.txt favicon.ico screenshot.png singlefile.html media/Some_video.mp4
Pass config options
Config can be persisted via file, set via env vars, or passed via CLI args.
# set per-user config in ~/.config/abx-dl/abx-dl.conf
abx-dl config --set CHECK_SSL_VALIDITY=True
# environment variables work too and are equivalent
env CHROME_USER_DATA_DIR=~/.config/abx-dl/personas/Default/chrome_profile
# pass per-run config as CLI args
abx-dl -c MAX_MEDIA_SIZE=250m --extract=title,singlefile,screenshot,media 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'
All Outputs
index.json,index.htmltitle.txt,title.json,headers.json,favicon.icoexample.com/*.{html,css,js,png...},warc/(saved withwget-lua)screenshot.png,dom.html,output.pdf(rendered withchrome)media/someVideo.mp4,media/subtitles, ... (downloaded withyt-dlp)readability/,mercury/,htmltotext.txt(article text/markdown)git/(source code)- ... and more via plugin library ...
For more advanced use with collections, parallel downloading, a Web UI + REST API, etc.
See: ArchiveBox/ArchiveBox