ws-export icon indicating copy to clipboard operation
ws-export copied to clipboard

A Wikimedia Toolforge tool for exporting ebooks from Wikisources.

WS Export

CI

WS Export is a tool for exporting Wikisource books to many formats, such as EPUB or PDF. The documentation can be found here: https://wikisource.org/wiki/Wikisource:WS_Export

Requirements

  • PHP 7.3 or 7.4
  • Composer
  • The fc-list command

Installation

  1. Get the source code:

    git clone https://github.com/wikimedia/ws-export.git
    cd tool
    
  2. Install dependencies:

    composer install --no-dev
    

    Then create a .env.local file.

    • In order to export to PDF, plain text, RTF, or Mobi formats you should also install Calibre so that the tool can use the ebook-convert command.
    • To validate exported ebooks (with the ./bin/console app:check command), you should also install epubcheck. If it's not installed at /usr/bin/epubcheck then set the EPUBCHECK_JAR environment variable.
    • To run the integration tests, also install the fonts-linuxlibertine package.
  3. Create a mysql database and database user and add these details to .env.local.

  4. Create the database with ./bin/console doctrine:database:create

  5. Run the migrations with ./bin/console doctrine:migrations:migrate

  6. This tool uses the Toolforge Bundle, and it connects to multiple databases.

  • Set replicas credentials in the .env.local file.

  • Establish an SSH tunnel to the replicas (only necessary on local environments)

$ ./bin/console toolforge:ssh
  • Bind address for docker enviroments
$ ./bin/console toolforge:ssh --bind-address=0.0.0.0

CLI Usage

app:check

Run epubcheck on books. With no options set, this will check 10 random books from English Wikisource. Note that the random 10 will be cached (for repeatability) unless you use --nocache.

app:check [-l|--lang LANG] [--nocache] [-t|--title TITLE] [-c|--count COUNT] [-s|--namespaces NAMESPACES]
  • --lang -l — Wikisource language code. Default: 'en'
  • --nocache — Do not cache anything (re-fetch all data).
  • --title -t — Wiki page name of a single work to check.
  • --count -c — How many random pages to check. Ignored if --title is used. Default: 10
  • --namespaces -s — Pipe-delimited namespace IDs. Ignored if --title is used.

app:export

Export a book.

app:export [-l|--lang LANG] [-t|--title TITLE] [-f|--format FORMAT] [-p|--path PATH] [--nocache] [--nocredits]
  • --lang -l — Wikisource language code.
  • --title -t — Wiki page name of the work to export. Required
  • --format -f — Export format. One of: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Default: 'epub-3'
  • --path -p — Filesystem path to export to. Default: '[CWD]'
  • --nocache — Do not cache anything (re-fetch all data).
  • --nocredits — Do not include the credits list in the exported ebook.

app:opds

Generate an OPDS file.

app:opds [-l|--lang LANG] [-c|--category CATEGORY]
  • --lang -l — Wikisource language code.
  • --category -c — Category name to export.

Tests

Run composer install to install dependencies required for testing.

Make sure the test database is created and migrations are up-to-date:

$ ./bin/console doctrine:database:create --env=test
$ ./bin/console doctrine:migrations:migrate --env=test --no-interaction

You only need to run the first command once, and the second one only when new migrations are created.

Tests are located in the tests/ directory, to run them:

$ ./bin/phpunit --exclude-group integration
$ ./bin/phpunit --group integration # runs integration tests (slow)

You can also run code linting etc. with composer test.

Docker Developer Environment

Wikisource export can also be run for development using Docker Compose. (beta, only tested on linux)

The default environment provides PHP, Apache, Calibre, Epubcheck and a MariaDB database.

Requirements

You'll need a locally running Docker and Docker Compose:


Quickstart

Modify or create .env.local. This config uses the database container defaults.

DATABASE_URL=mysql://root:@database:3306/wsexport

Do the same for the test database at .env.test.local, but giving a different database name:

DATABASE_URL=mysql://root:@database:3306/wsexport_test

Make sure you cd into ./docker

cd ./docker 

Run the following command to add your user ID and group ID to your .env file:

echo "WS_DOCKER_UID=$(id -u)
WS_DOCKER_GID=$(id -g)" >> ./.env

Optionally, set the port in .env (default is 8888):

WS_EXPORT_PORT=18000

Start the environment and install

# -d is detached mode - runs containers in the background:
docker-compose build && docker-compose up -d
docker-compose exec wsexport composer install
docker-compose exec wsexport ./bin/console doctrine:migrations:migrate --no-interaction

Wikisource Export should be up at http://localhost:8888/ (or the configured port)

Cache

Go to /refresh to clear the cache

Setup Xdebug

Xdebug is disabled by default. If you need to enable it you can do so via an env variable by creating a ./docker/docker-compose.override.yml file with the following content

version: '3.7'
services:
  wsexport:
    environment:
     - XDEBUG_MODE=debug

Visual Studio Code

Add the following configuration to your launch.json

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Listen for XDebug",
            "type": "php",
            "request": "launch",
            "port": 9000,
            "pathMappings": {
                "/var/www/html": "${workspaceFolder}"
            }
        }
    ]
}

You need to install the php-xdebug-ext

Licence

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

The Wikisource logo

The Wikisource logo is included as public/img/Wikisource-logo.svg, as an optimized form of the logo found at commons.wikimedia.org/wiki/File:Wikisource-logo.svg and subject to the licence restrictions specified on that page.