recap icon indicating copy to clipboard operation
recap copied to clipboard

Work with your web service, database, and streaming schemas in a single format.

recap

What is Recap?

Recap reads and writes schemas from web services, databases, and schema registries in a standard format.

⭐️ If you like this project, please give it a star! It helps the project get more visibility.

Table of Contents

  • What is Recap?
  • Supported Formats
  • Install
  • Usage
    • CLI
    • Gateway
    • Registry
    • API
    • Docker
  • Schema
  • Documentation

Supported Formats

Format Read Write
Avro
BigQuery
Confluent Schema Registry
Hive Metastore
JSON Schema
MySQL
PostgreSQL
Protobuf
Snowflake
SQLite

Install

Install Recap and all of its optional dependencies:

pip install 'recap-core[all]'

You can also select specific dependencies:

pip install 'recap-core[avro,kafka]'

See pyproject.toml for a list of optional dependencies.

Usage

CLI

Recap comes with a command line interface that can list and read schemas from external systems.

List the children of a URL:

recap ls postgresql://user:pass@host:port/testdb
[
  "pg_toast",
  "pg_catalog",
  "public",
  "information_schema"
]

Keep drilling down:

recap ls postgresql://user:pass@host:port/testdb/public
[
  "test_types"
]

Read the schema for the test_types table as a Recap struct:

recap schema postgresql://user:pass@host:port/testdb/public/test_types
{
  "type": "struct",
  "fields": [
    {
      "type": "int64",
      "name": "test_bigint",
      "optional": true
    }
  ]
}

Gateway

Recap comes with a stateless HTTP/JSON gateway that can list and read schemas from data catalogs and databases.

Start the server at http://localhost:8000:

recap serve

List the schemas in a PostgreSQL database:

curl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb
["pg_toast","pg_catalog","public","information_schema"]

And read a schema:

curl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types
{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}

The gateway fetches schemas from external systems in realtime and returns them as Recap schemas.

An OpenAPI schema is available at http://localhost:8000/docs.

Registry

You can store schemas in Recap's schema registry.

Start the server at http://localhost:8000:

recap serve

Put a schema in the registry:

curl -X POST \
    -H "Content-Type: application/x-recap+json" \
    -d '{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}' \
    http://localhost:8000/registry/some_schema

Get the schema (and version) from the registry:

curl http://localhost:8000/registry/some_schema
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]

Put a new version of the schema in the registry:

curl -X POST \
    -H "Content-Type: application/x-recap+json" \
    -d '{"type":"struct","fields":[{"type":"int32","name":"test_int","optional":true}]}' \
    http://localhost:8000/registry/some_schema

List schema versions:

curl http://localhost:8000/registry/some_schema/versions
[1,2]

Get a specific version of the schema:

curl http://localhost:8000/registry/some_schema/versions/1
[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]

The registry uses fsspec to store schemas in a variety of filesystems like S3, GCS, ABS, and the local filesystem. See the registry docs for more details.

An OpenAPI schema is available at http://localhost:8000/docs.

API

Recap has recap.converters and recap.clients packages.

  • Converters convert schemas to and from Recap schemas.
  • Clients read schemas from external systems (databases, schema registries, and so on) and use converters to return Recap schemas.

Read a schema from PostgreSQL:

from recap.clients import create_client

with create_client("postgresql://user:pass@host:port/testdb") as c:
    c.schema("testdb", "public", "test_types")

Convert the schema to Avro, Protobuf, and JSON schemas:

from recap.converters.avro import AvroConverter
from recap.converters.protobuf import ProtobufConverter
from recap.converters.json_schema import JSONSchemaConverter

avro_schema = AvroConverter().from_recap(struct)
protobuf_schema = ProtobufConverter().from_recap(struct)
json_schema = JSONSchemaConverter().from_recap(struct)

Transpile schemas from one format to another:

from recap.converters.json_schema import JSONSchemaConverter
from recap.converters.avro import AvroConverter

json_schema = """
{
    "type": "object",
    "$id": "https://recap.build/person.schema.json",
    "properties": {
        "name": {"type": "string"}
    }
}
"""

# Use Recap as an intermediate format to convert JSON schema to Avro
struct = JSONSchemaConverter().to_recap(json_schema)
avro_schema = AvroConverter().from_recap(struct)

Store schemas in Recap's schema registry:

from recap.storage.registry import RegistryStorage
from recap.types import StructType, IntType

storage = RegistryStorage("file:///tmp/recap-registry-storage")
version = storage.put(
    "postgresql://localhost:5432/testdb/public/test_table",
    StructType(fields=[IntType(32)])
)
storage.get("postgresql://localhost:5432/testdb/public/test_table")

# Get all versions of a schema
versions = storage.versions("postgresql://localhost:5432/testdb/public/test_table")

# List all schemas in the registry
schemas = storage.ls()

Docker

Recap's gateway and registry are also available as a Docker image:

docker run \
    -p 8000:8000 \
    -e RECAP_URLS=["postgresql://user:pass@localhost:5432/testdb"]' \
    ghcr.io/recap-build/recap:latest

See Recap's Docker documentation for more details.

Schema

See Recap's type spec for details on Recap's type system.

Documentation

Recap's documentation is available at recap.build.