dataasee
dataasee copied to clipboard
DatAasee - A Metadata-Lake for Libraries
DatAasee (0.5)
DatAasee centralizes and interlinks distributed library/research metadata into an API‑first union catalog.

A Metadata-Lake for Libraries
Repository: github.com/ulbmuenster/dataasee (nb sources backup)
Maintainer: Christian Himpe (at University and State Library of Münster)
Licenses: MIT (add. CC-BY for openapi.yaml)
Function: Metadata-Lake, Metadata Catalog, Metadata Aggregator, Union Catalog
Audience: University Libraries, Research Libraries, Academic Libraries, Scientific Libraries
Documentation
- Dependencies Overview
- Software Documentation
- Architecture Documentation
- Database Schema
- OpenAPI Schema (Swagger UI)
-
DatAasee: A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake (Companion Paper, Open Access)
Getting Started (Deployment)
Quick Start (Prepare a dedicated directory, inside run:)
$ wget https://raw.githubusercontent.com/ulbmuenster/dataasee/0.5/compose.yaml
$ mkdir -p -m 766 backup
$ DL_PASS=password1 DB_PASS=password2 docker compose up
Web: http://localhost:8000 (API: http://localhost:8343/api/v1/ )
- Depends on
docker compose(and compatible todockerandpodman) - To deploy, no need to clone, just use the
compose.yamlfile. - See the Deploy Documentation for details.
Tech Stack Canvas
- Setting: Many distributed data and metadata sources
-
Goals:
- Centralize metadata
- Interlinked metadata catalog
- Super-index for bibliographic and research data
-
Features:
- Interact through HTTP-API (JSON)
- Search by filter, full-text, source, doi
- Custom query via:
SQL,Gremlin,Cypher,MQL,GraphQL
- Frontend: Lowdefy (Optional)
- Backend: Connect (fmr. Benthos)
- Data Storage: ArcadeDB (Graph Database)
- Infrastructure: Compose (via Docker or Podman)
- Deployment: via Harbor (at Uni Münster)
- Monitoring: Container Logs (local logging driver)
-
Integrations:
-
Protocols:
OAI-PMH(HTTP),S3(HTTP),GET(HTTP),DatAasee(HTTP) -
Encodings:
XML(Plain-Text) -
Formats:
DataCite(XML),DC(XML),LIDO(XML),MARC(XML),MODS(XML)
-
Protocols:
-
Exports:
DataCite(JSON),BibJSON(JSON) - Security: Privileged endpoints (CQRS)
- Testing: check-jsonschema
- Development: Github
Default Ports
-
8343DatAasee API -
8000Web Frontend -
2480Database API (Development Container Images Only) -
9999Database JMX (Development Container Images Only)
API Cheat Sheet
-
GETapi/v1/apiReturns API specification and schemas. -
GETapi/v1/readyReturns service readiness. -
GETapi/v1/metadataReturns queried metadata records. -
GETapi/v1/sourcesReturns ingested metadata sources. -
GETapi/v1/schemaReturns database schema. -
GETapi/v1/enumsReturns enumerated attributes. -
GETapi/v1/statsReturns metadata record statistics. -
POSTapi/v1/backupTriggers database backup. -
POSTapi/v1/ingestTriggers async ingest of metadata. -
POSTapi/v1/insertInserts single metadata record. -
POSTapi/v1/healthProbes and returns service liveness.
Repository Contents
-
api/API definition and message schemas -
assets/Logos and style definition -
backend/Processor pipeline and component definitions -
container/Dockerfiles -
database/Database initialization, schemas and enumerated data -
docs/Documentation of software, data and architecture -
frontend/Prototype frontend definition -
tests/Test definitions and data
Getting Started (Development)
- Available
maketargets:-
make setupBuild server images (builds development images) -
make startStart servers -
make stopStop servers -
make resetStop and start servers -
make buildBuild release images (passREGISTRY=to set container image registry) -
make emptyDelete database backups -
make logsShow logs (requiresgrep) -
make peakReport peak database memory usage (requiresgrep) -
make testRun tests (requirescheck-jsonschema,busybox,wget) -
make tidyList violations of StrictYAML (requiresyamllint) -
make todoList inline TODOs in repo (requiresgrep)
-
- Custom
makevariable:COMPOSE(set Compose implementation)
Contributors
- See here
tl;dr
DatAasee is centralized Metasearch for distributed Metadata.