New Project Proposal - [Sandbox Proposal] AI Dataset Health for z/OS (ai-dataset-health-zos)
Project description
ai-dataset-health-zos is a small, open-source Python tool that:
- Lists repository files (future: z/OS datasets via z/OSMF Jobs & Files APIs).
- Computes a simple dataset health score (initial rule: zero-byte file detection).
- Exposes a CLI (
list_files.py --health) with human-readable output. - Ships with modern CI gates: Ruff (lint), Black (format), MyPy (types), Pytest (coverage). Why valuable: Mainframe teams often lack lightweight, open, automatable checks for dataset health (empties, size/naming anomalies, staleness). This project provides an approachable baseline that fits CI/CD and sets a path to add AI/ONNX scoring and z/OSMF integration on Z. Origin/history: Built as a focused MVP to demonstrate CI-first mainframe tooling with a clear evolution to z/OSMF and AI-assisted rules.
Statement on alignment with Open Mainframe Project Mission and Vision statements
- Promotes open source innovation on mainframe by offering a minimal, extensible dataset health checker.
- Lowers the barrier for new contributors (pure Python + modern CI).
- Encourages interoperability with existing Z ecosystems via planned z/OSMF adapters, without vendor lock-in.
- Creates community building blocks to apply AI techniques (e.g., ONNX) to mainframe operations data.
Are there similar/related projects out there?
- Zowe CLI provides dataset operations, but not a focused health scoring workflow nor an AI-ready scoring path.
- Proprietary tools exist for checks/audits, but they are not open, CI-first, or designed for community rule extensions. Differentiators:
- Explicit health scoring model (start simple, grow rules).
- CI-first repo (ruff/black/mypy/pytest/coverage, smoke artifact).
- Clear roadmap to z/OSMF adapters and ONNX inference.
Sponsor from TAC
To be appointed
Proposed Project Stage
Sandbox
License and contribution guidelines
- Current license: MIT (OSI-approved).
- If accepted into OMP: willing to relicense to Apache-2.0.
- Contribution flow: GitHub PRs, DCO sign-off (
Signed-off-by), code style via Ruff/Black, type hints via MyPy, tests via Pytest with coverage gate.
Current or desired source control repository
GitHub (current): https://github.com/marbatis/ai-dataset-health-zos
External dependencies (including licenses)
Runtime: Python 3.11+ (stdlib only for MVP). Dev/CI: ruff (MIT), black (MIT), mypy (MIT), pytest/pytest-cov (MIT). Planned (optional): onnxruntime (MIT) for AI scoring. No commercial or non-redistributable software required.
Initial committers
- Marcelo Silveira (@marbatis) — creator/maintainer; all initial commits.
- Current community size: 1 maintainer; goal is to add 2–3 co-maintainers within 3–6 months via OMP community.
Infrastructure requests
- CI: GitHub Actions (already in place: ruff, black, mypy, pytest, coverage ≥80%, “health smoke” artifact).
- Request access to an OMP open z/OS environment with z/OSMF Jobs & Files to validate dataset APIs.
- (Optional later) A project Slack channel and a simple GitHub Pages site (if recommended).
Communication channels
- Request a project Slack channel in the OMP workspace (e.g., #ai-dataset-health-zos).
- GitHub Discussions in the repo for Q&A and design notes.
- (Optional) Mailing list if OMP prefers; otherwise Discussions is sufficient at MVP stage.
Communication channels
GitHub Issues: https://github.com/marbatis/ai-dataset-health-zos/issues
Website
Use the repository README as the landing page for now. (Optionally add GitHub Pages after acceptance.)
Release methodology and mechanics
- SemVer. Start with 0.x pre-releases while MVP evolves.
- GitHub Releases + tags; changelog in RELEASE_NOTES.md.
- Every release built from passing CI (ruff/black/mypy/pytest/coverage).
Social media accounts
None yet. Will coordinate with OMP social channels after acceptance.
Community size and any existing sponsorship
- Size: single-maintainer MVP (1).
- No commercial sponsorship; seeking community contributions and TAC mentorship.
requesting a TAC agenda slot.
Thanks @marbatis - I have some documents to share with you to get the LF onboarding process going. Can you drop me an email at jmertic at linuxfoundation dot org, and I can share them with you. Thanks!
hi @jmertic . I sent you an email. thanks.
Hey @marbatis - I didn’t see the email come over; can you resend?
@jmertic my email is marbatis at Hotmail dot com. thanks
@jmertic Housekeeping completed in the repo:
• LICENSE (Apache-2.0) + NOTICE
• CONTRIBUTING (with DCO), CODE_OF_CONDUCT (LF CoC), GOVERNANCE
• CI is green on main (ruff/black/mypy/pytest + health smoke)
Ready for the next steps. Thanks!
Hi @marbatis - would you be available to present this to the TAC during the 10/9 meeting at 1pm ET?
hi @jmertic , yes I am available. thanks
Great - you are scheduled in! You can register for the meeting at https://zoom-lfx.platform.linuxfoundation.org/meeting/96768093075?password=1b7c020a-bad8-4ba6-91f2-8156091e05fa
Sorry @marbatis - I misspoke - can you do 10/23 instead? I forgot to check the 10/9 meeting first, which right now is full.
@jmertic no worries... yes, I am available for this new date
Appreciate your flexability @marbatis - new Zoom link for the meeting -> https://zoom-lfx.platform.linuxfoundation.org/meeting/97287182990?password=033f3235-8042-4d14-9a1a-f88992db8437
If you have slides to share, please share them ahead of time in this GH issue so TAC members can pre-read.
Thanks!
@jmertic sent you the deck and pdf via email. Let me know if you don't get them. thanks.
Much appreciated. Thanks!
On Mon, Sep 22, 2025 at 5:51 PM marbatis @.***> wrote:
marbatis left a comment (openmainframeproject/tac#895) https://github.com/openmainframeproject/tac/issues/895#issuecomment-3321603757
@jmertic https://github.com/jmertic sent you the deck and pdf via email. Let me know if you don't get them. thanks.
— Reply to this email directly, view it on GitHub https://github.com/openmainframeproject/tac/issues/895#issuecomment-3321603757, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACIOIKLGIUD6OX5XLCX5OD3UBVM3AVCNFSM6AAAAACES4FUDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMRRGYYDGNZVG4 . You are receiving this because you were mentioned.Message ID: @.***>
Hi @marbatis - sorry for the delays. We are confirming that the OMP infrastructure will handle your needs, will revert once we know more. Thanks!
@jmertic Great news! thanks