ws-export
ws-export copied to clipboard
A Wikimedia Toolforge tool for exporting ebooks from Wikisources.
WS Export
WS Export is a tool for exporting Wikisource books to many formats, such as EPUB or PDF. The documentation can be found here: https://wikisource.org/wiki/Wikisource:WS_Export
Requirements
- PHP 7.3 or 7.4
- Composer
- The
fc-listcommand
Installation
-
Get the source code:
git clone https://github.com/wikimedia/ws-export.git cd tool -
Install dependencies:
composer install --no-devThen create a
.env.localfile.- In order to export to PDF, plain text, RTF, or Mobi formats
you should also install Calibre
so that the tool can use the
ebook-convertcommand. - To validate exported ebooks (with the
./bin/console app:checkcommand), you should also install epubcheck. If it's not installed at/usr/bin/epubcheckthen set theEPUBCHECK_JARenvironment variable. - To run the integration tests, also install the
fonts-linuxlibertinepackage.
- In order to export to PDF, plain text, RTF, or Mobi formats
you should also install Calibre
so that the tool can use the
-
Create a mysql database and database user and add these details to
.env.local. -
Create the database with
./bin/console doctrine:database:create -
Run the migrations with
./bin/console doctrine:migrations:migrate -
This tool uses the Toolforge Bundle, and it connects to multiple databases.
-
Set replicas credentials in the
.env.localfile. -
Establish an SSH tunnel to the replicas (only necessary on local environments)
$ ./bin/console toolforge:ssh
- Bind address for docker enviroments
$ ./bin/console toolforge:ssh --bind-address=0.0.0.0
CLI Usage
app:check
Run epubcheck on books. With no options set, this will check 10 random books from English Wikisource. Note that the random 10 will be cached (for repeatability) unless you use
app:check [-l|--lang LANG] [--nocache] [-t|--title TITLE] [-c|--count COUNT] [-s|--namespaces NAMESPACES]
--lang-l— Wikisource language code. Default: 'en'--nocache— Do not cache anything (re-fetch all data).--title-t— Wiki page name of a single work to check.--count-c— How many random pages to check. Ignored if--title is used. Default: 10--namespaces-s— Pipe-delimited namespace IDs. Ignored if--title is used.
app:export
Export a book.
app:export [-l|--lang LANG] [-t|--title TITLE] [-f|--format FORMAT] [-p|--path PATH] [--nocache] [--nocredits]
--lang-l— Wikisource language code.--title-t— Wiki page name of the work to export. Required--format-f— Export format. One of: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Default: 'epub-3'--path-p— Filesystem path to export to. Default: '[CWD]'--nocache— Do not cache anything (re-fetch all data).--nocredits— Do not include the credits list in the exported ebook.
app:opds
Generate an OPDS file.
app:opds [-l|--lang LANG] [-c|--category CATEGORY]
--lang-l— Wikisource language code.--category-c— Category name to export.
Tests
Run composer install to install dependencies required for testing.
Make sure the test database is created and migrations are up-to-date:
$ ./bin/console doctrine:database:create --env=test
$ ./bin/console doctrine:migrations:migrate --env=test --no-interaction
You only need to run the first command once, and the second one only when new migrations are created.
Tests are located in the tests/ directory, to run them:
$ ./bin/phpunit --exclude-group integration
$ ./bin/phpunit --group integration # runs integration tests (slow)
You can also run code linting etc. with composer test.
Docker Developer Environment
Wikisource export can also be run for development using Docker Compose. (beta, only tested on linux)
The default environment provides PHP, Apache, Calibre, Epubcheck and a MariaDB database.
Requirements
You'll need a locally running Docker and Docker Compose:
Quickstart
Modify or create .env.local. This config uses the database container defaults.
DATABASE_URL=mysql://root:@database:3306/wsexport
Do the same for the test database at .env.test.local, but giving a different database name:
DATABASE_URL=mysql://root:@database:3306/wsexport_test
Make sure you cd into ./docker
cd ./docker
Run the following command to add your user ID and group ID to your .env file:
echo "WS_DOCKER_UID=$(id -u)
WS_DOCKER_GID=$(id -g)" >> ./.env
Optionally, set the port in .env (default is 8888):
WS_EXPORT_PORT=18000
Start the environment and install
# -d is detached mode - runs containers in the background:
docker-compose build && docker-compose up -d
docker-compose exec wsexport composer install
docker-compose exec wsexport ./bin/console doctrine:migrations:migrate --no-interaction
Wikisource Export should be up at http://localhost:8888/ (or the configured port)
Cache
Go to /refresh to clear the cache
Setup Xdebug
Xdebug is disabled by default. If you need to enable it you can do so via an env variable by creating a ./docker/docker-compose.override.yml file with the following content
version: '3.7'
services:
wsexport:
environment:
- XDEBUG_MODE=debug
Visual Studio Code
Add the following configuration to your launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "Listen for XDebug",
"type": "php",
"request": "launch",
"port": 9000,
"pathMappings": {
"/var/www/html": "${workspaceFolder}"
}
}
]
}
You need to install the php-xdebug-ext
Licence
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
The Wikisource logo
The Wikisource logo is included as public/img/Wikisource-logo.svg,
as an optimized form of the logo
found at commons.wikimedia.org/wiki/File:Wikisource-logo.svg
and subject to the licence restrictions specified on that page.