tac icon indicating copy to clipboard operation
tac copied to clipboard

New Project Proposal - [Sandbox Proposal] AI Dataset Health for z/OS (ai-dataset-health-zos)

Open marbatis opened this issue 4 months ago • 16 comments

Project description

ai-dataset-health-zos is a small, open-source Python tool that:

  • Lists repository files (future: z/OS datasets via z/OSMF Jobs & Files APIs).
  • Computes a simple dataset health score (initial rule: zero-byte file detection).
  • Exposes a CLI (list_files.py --health) with human-readable output.
  • Ships with modern CI gates: Ruff (lint), Black (format), MyPy (types), Pytest (coverage). Why valuable: Mainframe teams often lack lightweight, open, automatable checks for dataset health (empties, size/naming anomalies, staleness). This project provides an approachable baseline that fits CI/CD and sets a path to add AI/ONNX scoring and z/OSMF integration on Z. Origin/history: Built as a focused MVP to demonstrate CI-first mainframe tooling with a clear evolution to z/OSMF and AI-assisted rules.

Statement on alignment with Open Mainframe Project Mission and Vision statements

  • Promotes open source innovation on mainframe by offering a minimal, extensible dataset health checker.
  • Lowers the barrier for new contributors (pure Python + modern CI).
  • Encourages interoperability with existing Z ecosystems via planned z/OSMF adapters, without vendor lock-in.
  • Creates community building blocks to apply AI techniques (e.g., ONNX) to mainframe operations data.

Are there similar/related projects out there?

  • Zowe CLI provides dataset operations, but not a focused health scoring workflow nor an AI-ready scoring path.
  • Proprietary tools exist for checks/audits, but they are not open, CI-first, or designed for community rule extensions. Differentiators:
  • Explicit health scoring model (start simple, grow rules).
  • CI-first repo (ruff/black/mypy/pytest/coverage, smoke artifact).
  • Clear roadmap to z/OSMF adapters and ONNX inference.

Sponsor from TAC

To be appointed

Proposed Project Stage

Sandbox

License and contribution guidelines

  • Current license: MIT (OSI-approved).
  • If accepted into OMP: willing to relicense to Apache-2.0.
  • Contribution flow: GitHub PRs, DCO sign-off (Signed-off-by), code style via Ruff/Black, type hints via MyPy, tests via Pytest with coverage gate.

Current or desired source control repository

GitHub (current): https://github.com/marbatis/ai-dataset-health-zos

External dependencies (including licenses)

Runtime: Python 3.11+ (stdlib only for MVP). Dev/CI: ruff (MIT), black (MIT), mypy (MIT), pytest/pytest-cov (MIT). Planned (optional): onnxruntime (MIT) for AI scoring. No commercial or non-redistributable software required.

Initial committers

  • Marcelo Silveira (@marbatis) — creator/maintainer; all initial commits.
  • Current community size: 1 maintainer; goal is to add 2–3 co-maintainers within 3–6 months via OMP community.

Infrastructure requests

  • CI: GitHub Actions (already in place: ruff, black, mypy, pytest, coverage ≥80%, “health smoke” artifact).
  • Request access to an OMP open z/OS environment with z/OSMF Jobs & Files to validate dataset APIs.
  • (Optional later) A project Slack channel and a simple GitHub Pages site (if recommended).

Communication channels

  • Request a project Slack channel in the OMP workspace (e.g., #ai-dataset-health-zos).
  • GitHub Discussions in the repo for Q&A and design notes.
  • (Optional) Mailing list if OMP prefers; otherwise Discussions is sufficient at MVP stage.

Communication channels

GitHub Issues: https://github.com/marbatis/ai-dataset-health-zos/issues

Website

Use the repository README as the landing page for now. (Optionally add GitHub Pages after acceptance.)

Release methodology and mechanics

  • SemVer. Start with 0.x pre-releases while MVP evolves.
  • GitHub Releases + tags; changelog in RELEASE_NOTES.md.
  • Every release built from passing CI (ruff/black/mypy/pytest/coverage).

Social media accounts

None yet. Will coordinate with OMP social channels after acceptance.

Community size and any existing sponsorship

  • Size: single-maintainer MVP (1).
  • No commercial sponsorship; seeking community contributions and TAC mentorship.

marbatis avatar Aug 22 '25 20:08 marbatis

requesting a TAC agenda slot.

marbatis avatar Aug 22 '25 20:08 marbatis

Thanks @marbatis - I have some documents to share with you to get the LF onboarding process going. Can you drop me an email at jmertic at linuxfoundation dot org, and I can share them with you. Thanks!

jmertic avatar Aug 26 '25 13:08 jmertic

hi @jmertic . I sent you an email. thanks.

marbatis avatar Aug 28 '25 15:08 marbatis

Hey @marbatis - I didn’t see the email come over; can you resend?

jmertic avatar Aug 28 '25 18:08 jmertic

@jmertic my email is marbatis at Hotmail dot com. thanks

marbatis avatar Aug 28 '25 20:08 marbatis

@jmertic Housekeeping completed in the repo: • LICENSE (Apache-2.0) + NOTICE • CONTRIBUTING (with DCO), CODE_OF_CONDUCT (LF CoC), GOVERNANCE
• CI is green on main (ruff/black/mypy/pytest + health smoke)

Ready for the next steps. Thanks!

marbatis avatar Aug 29 '25 13:08 marbatis

Hi @marbatis - would you be available to present this to the TAC during the 10/9 meeting at 1pm ET?

jmertic avatar Sep 18 '25 13:09 jmertic

hi @jmertic , yes I am available. thanks

marbatis avatar Sep 18 '25 13:09 marbatis

Great - you are scheduled in! You can register for the meeting at https://zoom-lfx.platform.linuxfoundation.org/meeting/96768093075?password=1b7c020a-bad8-4ba6-91f2-8156091e05fa

jmertic avatar Sep 18 '25 14:09 jmertic

Sorry @marbatis - I misspoke - can you do 10/23 instead? I forgot to check the 10/9 meeting first, which right now is full.

jmertic avatar Sep 18 '25 17:09 jmertic

@jmertic no worries... yes, I am available for this new date

marbatis avatar Sep 18 '25 22:09 marbatis

Appreciate your flexability @marbatis - new Zoom link for the meeting -> https://zoom-lfx.platform.linuxfoundation.org/meeting/97287182990?password=033f3235-8042-4d14-9a1a-f88992db8437

If you have slides to share, please share them ahead of time in this GH issue so TAC members can pre-read.

Thanks!

jmertic avatar Sep 19 '25 11:09 jmertic

@jmertic sent you the deck and pdf via email. Let me know if you don't get them. thanks.

marbatis avatar Sep 22 '25 21:09 marbatis

Much appreciated. Thanks!

On Mon, Sep 22, 2025 at 5:51 PM marbatis @.***> wrote:

marbatis left a comment (openmainframeproject/tac#895) https://github.com/openmainframeproject/tac/issues/895#issuecomment-3321603757

@jmertic https://github.com/jmertic sent you the deck and pdf via email. Let me know if you don't get them. thanks.

— Reply to this email directly, view it on GitHub https://github.com/openmainframeproject/tac/issues/895#issuecomment-3321603757, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACIOIKLGIUD6OX5XLCX5OD3UBVM3AVCNFSM6AAAAACES4FUDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMRRGYYDGNZVG4 . You are receiving this because you were mentioned.Message ID: @.***>

jmertic avatar Sep 23 '25 14:09 jmertic

Hi @marbatis - sorry for the delays. We are confirming that the OMP infrastructure will handle your needs, will revert once we know more. Thanks!

jmertic avatar Nov 18 '25 13:11 jmertic

@jmertic Great news! thanks

marbatis avatar Nov 19 '25 21:11 marbatis