go-download-web
go-download-web copied to clipboard
Download an entire website with Go
Go Download Web
Go Download Web is a command-line application developed with Go that allows you to download an entire online website, including CSS, JavaScript, Images, and other assets.
Project Status
The application is functional and has been improved recently. However, there are still some tasks pending:
- Add headless browser support to download JS-generated content.
- Implement download resuming.
- Enable parallel downloading of attachments.
Installation
To install, follow these simple steps:
$ git clone https://github.com/antsanchez/go-download-web
$ cd go-download-web
$ go build
Usage
The application can be used with various flags to customize the download process:
$ ./go-download-web -u <URL>
-
-u
or--url
: The URL of the website to download content from. This is a required field.
$ ./go-download-web -u <URL> -new <NEW_URL>
-
-new
or--new-url
: The new URL to use for the downloaded content. This is an optional field.
$ ./go-download-web -u <URL> -r <INCLUDED_URLS>
-
-r
or--included-urls
: The URL prefixes/root paths that should be included in the download. This is an optional field.
$ ./go-download-web -u <URL> -s <SIMULTANEOUS_CONNECTIONS>
-
-s
or--simultaneous
: The number of concurrent connections. The default value is 3, and the minimum is 1.
$ ./go-download-web -u <URL> -q
-
-q
or--use-queries
: A flag to ignore query strings in URLs. This is an optional field.
$ ./go-download-web -u <URL> -path <DOWNLOAD_PATH>
-
-path
or--download-path
: The local path to save the downloaded files. The default value is./website
.
For help, use the -h
or --help
flag:
$ ./go-download-web -h
Development
Generate Mocks
This project uses uber-go/mock to generate mocks for testing. To generate mocks, run the following commands:
mockgen -destination=pkg/console/mock_console.go -package=console github.com/antsanchez/go-download-web/pkg/scraper Console
mockgen -destination=pkg/get/mock_get.go -package=get github.com/antsanchez/go-download-web/pkg/scraper HttpGet
These commands generate mocks for the Console
and HttpGet
interfaces in the scraper
package. The generated mocks are saved in the pkg/console
and pkg/get
packages, respectively.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Please make sure to update tests as appropriate.