sapporo-service

The sapporo-service is a standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification.

We have also extended the API specification. For more details, please refer to ./sapporo-wes-1-1-0-openapi-spec.yml for more details.

One of the key features of the sapporo-service is its ability to abstract workflow engines, making it easy to adapt various workflow engines to the WES standard. Currently, we have verified compatibility with the following workflow engines:

Another unique feature of the sapporo-service is a mode that permits only workflows registered by the system administrator to be executed. This feature is particularly beneficial when setting up a WES in a shared HPC environment.

Installation and Startup

The sapporo-service is compatible with Python 3.8 or later versions.

You can install it using pip:

pip3 install sapporo

To start the sapporo-service, run the following command:

sapporo

Using Docker

Alternatively, you can run the sapporo-service using Docker. If you want to use Docker-in-Docker (DinD), make sure to mount docker.sock, /tmp, and other necessary directories.

To start the sapporo-service using Docker, run the following command:

docker compose up -d

Usage

You can view the help for the sapporo-service as follows:

$ sapporo --help
usage: sapporo [-h] [--host] [-p] [--debug] [-r] [--disable-get-runs]
               [--disable-workflow-attachment] [--run-only-registered-workflows]
               [--service-info] [--executable-workflows] [--run-sh]
               [--url-prefix] [--auth-config]

This is an implementation of a GA4GH workflow execution service that can easily
support various workflow runners.

optional arguments:
  -h, --help            show this help message and exit
  --host                Specify the host address for Flask. (default: 127.0.0.1)
  -p , --port           Specify the port for Flask. (default: 1122)
  --debug               Enable Flask's debug mode.
  -r , --run-dir        Specify the run directory. (default: ./run)
  --disable-get-runs    Disable the `GET /runs` endpoint.
  --disable-workflow-attachment
                        Disable the `workflow_attachment` feature on the `Post
                        /runs` endpoint.
  --run-only-registered-workflows
                        Only run registered workflows. Check the registered
                        workflows using `GET /executable-workflows`, and specify
                        the `workflow_name` in the `POST /run` request.
  --service-info        Specify the `service-info.json` file. The
                        `supported_wes_versions` and `system_state_counts` will
                        be overwritten by the application.
  --executable-workflows
                        Specify the `executable-workflows.json` file.
  --run-sh              Specify the `run.sh` file.
  --url-prefix          Specify the prefix of the URL (e.g., --url-prefix /foo
                        will result in /foo/service-info).
  --auth-config         Specify the `auth-config.json` file.

Operating Mode

The sapporo-service can be started in one of the following two modes:

Standard WES mode (Default)
Execute only registered workflows mode

You can switch between these modes using the --run-only-registered-workflows startup argument or by setting the SAPPORO_ONLY_REGISTERED_WORKFLOWS environment variable to True or False. Note that startup arguments take precedence over environment variables.

Standard WES Mode

In this mode, the sapporo-service conforms to the standard WES API specification.

Execute Only Registered Workflows Mode

In this mode, the sapporo-service only allows workflows registered by the system administrator to be executed.

The key changes in this mode are:

GET /executable_workflows returns the list of executable workflows.
POST /runs, use workflow_name instead of workflow_url.

The list of executable workflows is managed in executable_workflows.json. By default, this file is located in the application directory of the sapporo-service. However, you can override it using the startup argument --executable-workflows or the environment variable SAPPORO_EXECUTABLE_WORKFLOWS.

Run Directory

The sapporo-service organizes all submitted workflows, workflow parameters, output files, and related data within a specific directory on the file system. This directory, known as the "run directory". To specify a different location for the run directory, use the startup argument --run-dir or set the environment variable SAPPORO_RUN_DIR.

The run dir structure is as follows:

$ tree run
.
└── 29
    └── 29109b85-7935-4e13-8773-9def402c7775
        ├── cmd.txt
        ├── end_time.txt
        ├── exe
        │   └── workflow_params.json
        ├── exit_code.txt
        ├── outputs
        │   ├── <output_file>
        ├── outputs.json
        ├── run.pid
        ├── run_request.json
        ├── start_time.txt
        ├── state.txt
        ├── stderr.log
        ├── stdout.log
        └── workflow_engine_params.txt
├── 2d
│   └── ...
└── 6b
    └── ...

You can manage each run by physically deleting it using the rm command.

Executing POST /runs can be quite complex. For your convenience, we've provided examples using curl in the ./tests/curl_example directory. Please refer to these examples as a guide.

`run.sh`

The run.sh script is used to abstract the workflow engine. When POST /runs is invoked, the sapporo-service forks the execution of run.sh after preparing the necessary files in the run directory. This allows you to adapt various workflow engines to WES by modifying run.sh.

By default, run.sh is located in the application directory of the sapporo-service. You can override this location using the startup argument --run-sh or the environment variable SAPPORO_RUN_SH.

Other Startup Arguments

You can modify the host and port used by the application using the startup arguments --host and --port or the environment variables SAPPORO_HOST and SAPPORO_PORT.

The following three startup arguments and corresponding environment variables can be used to limit the WES:

--disable-get-runs / SAPPORO_GET_RUNS: Disables GET /runs. This can be useful when using WES with an unspecified number of users, as it prevents users from viewing or cancelling other users' runs by knowing the run_id.
--disable-workflow-attachment / SAPPORO_WORKFLOW_ATTACHMENT: Disables the workflow_attachment field in POST /runs. This field is used to attach files for executing workflows, and disabling it can address security concerns.
--url-prefix / SAPPORO_URL_PREFIX: Sets the URL prefix. For example, if --url-prefix /foo/bar is set, GET /service-info becomes GET /foo/bar/service-info.

The response content of GET /service-info is managed in service-info.json. By default, this file is located in the application directory of the sapporo-service. You can override this location using the startup argument --service-info or the environment variable SAPPORO_SERVICE_INFO.

Generate Download Link

The sapporo-service allows you to generate download links for files and directories located under the run_dir.

For more details, please refer to the GetData section in ./sapporo-wes-1-1-0-openapi-spec.yml.

Parse Workflow

The sapporo-service offers a feature to inspect the type, version, and inputs of a workflow document.

For more details, please refer to the ParseWorkflow section in ./sapporo-wes-1-1-0-openapi-spec.yml.

Generate RO-Crate

Upon completion of workflow execution, the sapporo-service generates an RO-Crate from the run_dir, which is saved as ro-crate-metadata.json within the same directory. You can download the RO-Crate using the GET /runs/{run_id}/ro-crate/data/ro-crate-metadata.json endpoint.

Additionally, you can generate an RO-Crate from the run_dir as follows:

# Inside the Sapporo run_dir
$ ls
cmd.txt                     run.sh                      state.txt
exe/                        run_request.json            stderr.log
executable_workflows.json   sapporo_config.json         stdout.log
outputs/                    service_info.json           workflow_engine_params.txt
run.pid                     start_time.txt              yevis-metadata.yml

# Execute the sapporo/ro_crate.py script
$ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -v $PWD:$PWD -w $PWD ghcr.io/sapporo-wes/sapporo-service:latest python3 /app/sapporo/ro_crate.py $PWD

For more information on RO-Crate, please also refer to ./tests/ro-crate.

Authentication

The sapporo-service supports authentication, configurable via the ./sapporo/auth_config.json. By default, this configuration is as follows:

{
  "auth_enabled": false,
  "auth_provider": "local",
  "local_auth": {
    "jwt_secret_key": "spr_secret_key_please_change_this",
    "users": [
      {
        "username": "spr_test_user",
        "password": "spr_test_password"
      }
    ]
  },
  "oidc_auth": {
    "realm_url": "http://localhost:8080/realms/sapporo-dev",
    "username_claim": "sub"
  }
}

This configuration file can be directly edited or relocated using the --auth-config startup argument or the SAPPORO_AUTH_CONFIG environment variable.

Configuration Fields

auth_enabled: Determines if JWT authentication is activated. If set to true, JWT authentication is enabled.
auth_provider: Specifies the type of authentication provider, supporting:
- local: Uses a locally managed list of users for authentication.
  - Tokens are issued by Sapporo.
  - Usernames and passwords are referenced from the auth_config.json.
- oidc: Uses an OpenID Connect (OIDC) provider like Keycloak.
  - Tokens are issued by the OIDC provider.
  - User information is managed by the OIDC provider.
local_auth: Configuration for local authentication includes:
- jwt_secret_key: Secret key for signing JWTs. Changing this key is highly recommended.
- users: List of users eligible for JWT authentication, specifying username and password.
oidc_auth: Configuration for OIDC authentication includes:
- realm_url: URL of the OIDC realm.
- username_claim: JWT claim used as the username.

Authentication Endpoints

When JWT authentication is enabled, endpoints requiring authentication include:

GET /runs
POST /runs
GET /runs/{run_id}
POST /runs/{run_id}/cancel
GET /runs/{run_id}/status
GET /runs/{run_id}/data

Each run is associated with a username, ensuring that only the user who created a run can access details like GET /runs/{run_id}.

Local Authentication

For local JWT authentication, configure auth_config.json as shown:

{
  "auth_enabled": true,
  "auth_provider": "local",
  "local_auth": {
    "jwt_secret_key": "new_secret_key",
    "users": [
      {
        "username": "user1",
        "password": "password1"
      },
      {
        "username": "user2",
        "password": "password2"
      }
    ]
  },
  "oidc_auth": {
    "realm_url": "http://localhost:8080/realms/sapporo-dev",
    "username_claim": "sub"
  }
}

Starting sapporo-service with this configuration allows access to the GET /service-info endpoint, while GET /runs will require authentication:

# Start sapporo-service
$ sapporo

# GET /service-info
$ curl -X GET localhost:1122/service-info
{
  "auth_instructions_url": "https://github.com/sapporo-wes/sapporo-service",
  "contact_info_url": "https://github.com/sapporo-wes/sapporo-service",
...

# Generate JWT for authentication
$ TOKEN=$(curl -s -X POST \
    -H "Content-Type: application/json" \
    -d '{"username":"user1", "password":"password1"}' \
    localhost:1122/auth | jq -r '.access_token')

# Authenticate and access runs
$ curl -X GET -H "Authorization: Bearer $TOKEN" localhost:1122/runs
{
  "runs": []
}

OpenID Connect (OIDC) Authentication

For OIDC authentication, ensure the auth_provider is set to oidc and appropriate configurations are specified under oidc_auth. Users must obtain a token from the OIDC provider and attach it to the Authorization header for authentication.

Development

To start the development environment, follow these steps:

$ docker compose -f compose.dev.yml up -d --build
$ docker compose -f compose.dev.yml exec app bash
# inside container
$ sapporo

We utilize flake8, isort, and mypy for linting and style checking.

bash ./tests/lint_and_style_check/flake8.sh
bash ./tests/lint_and_style_check/isort.sh
bash ./tests/lint_and_style_check/mypy.sh

bash ./tests/lint_and_style_check/run_all.sh

For testing, we use pytest.

pytest .

Adding New Workflow Engines to Sapporo Service

Take a look at the run.sh script, which is invoked from Python. This shell script receives a request with a Workflow Engine such as cwltool and triggers the run_cwltool bash function.

This function executes a Bash Shell command to start a Docker container for the Workflow Engine and monitors its exit status. For a comprehensive example, please refer to this pull request: https://github.com/sapporo-wes/sapporo-service/pull/29

License

This project is licensed under Apache-2.0. See the LICENSE file for details.

Notice

Please note that this repository is participating in a study into sustainability of open source projects. Data will be gathered about this repository for approximately the next 12 months, starting from 2021-06-16.

Data collected will include number of contributors, number of PRs, time taken to close/merge these PRs, and issues closed.

For more information, please visit our informational page or download our participant information sheet.

sapporo-service
sapporo-service copied to clipboard

Metadata

sapporo-service

Installation and Startup

Using Docker

Usage

Operating Mode

Standard WES Mode

Execute Only Registered Workflows Mode

Run Directory

`run.sh`

Other Startup Arguments

Generate Download Link

Parse Workflow

Generate RO-Crate

Authentication

Configuration Fields

Authentication Endpoints

Local Authentication

OpenID Connect (OIDC) Authentication

Development

Adding New Workflow Engines to Sapporo Service

License

Notice

← Metadata

Owner

Metadata

sapporo-service sapporo-service copied to clipboard

Metadata

sapporo-service

Installation and Startup

Using Docker

Usage

Operating Mode

Standard WES Mode

Execute Only Registered Workflows Mode

Run Directory

run.sh

Other Startup Arguments

Generate Download Link

Parse Workflow

Generate RO-Crate

Authentication

Configuration Fields

Authentication Endpoints

Local Authentication

OpenID Connect (OIDC) Authentication

Development

Adding New Workflow Engines to Sapporo Service

License

Notice

← Metadata

Owner

Metadata

sapporo-service
sapporo-service copied to clipboard

`run.sh`