Generating a documentation concept for the MIxS project
A major blocker for new contributors to the MIxS project is the lack of and quality of documentation.
Without clear and well structured documentation, potential contributors will walk away as they will not know to start. New contributors will struggle and possibly turn away from the project due to an inability to make sufficient progress if they do not know how to proceed. 'Institutional knowledge' will be lost as long-standing contributors will move on, and will slowly
Note this issue is to act as a primer to the MIxS working day at the GSC 2025 conference
Current status
Current for MIxS documentation is spread over multiple locations (GSC website, an auto-generated MIxS website, non-rendered markdown files within the MIxS GitHub repo, on dispersed Google docs, academic journal articles, GitHub wikipages). Some of the documentation are also very out of date or unfinished (particularly on the GSC website).
As far as I am aware, there are currently only two actual websites that host MIxS documentation:
- https://gensc.org
- https://genomicsstandardsconsortium.github.io/mixs
I think there are three main bits of documentation that need to be represented by the project.
- The general consortium information/homepage (human-written)
- Who is the GSC?
- What does the GenSC do?
- The MIxS project information and documentation (human-written)
- What is the MIxS project?
- How does the project work?
- How do you contribute to the project?
- How do you use the output from the project?
- What processes and procedures does the project involve?
- The MIxS reference documentation (auto-generated)
- Human readable representation of documentation-embedded code
Different scenarios
In mind I currently reflexively envision 3 different scenarios:
- 3 websites: gensc, MIxS project, MIxS reference
- Pros: clear distinction between GSC and MIxS projects, reduced risk of breaking automation through mixing of 'curated' MIxS documenation; better 'security' (mixs contributors can edit MIxS docs without access to gensc website)
- Cons: higher maintainence overhead(? - although automated reference docs should just be a single command)
- 2 websites: gensc (MIxS project within), MIxS reference
- Pros: minimises maintainence burden
- Cons: splitting of MIxS documentation to two locations, harder to find and cross references; bloats GSC website with highly specific MIxS documentation; optimal 'corporate' websites require different structure from 'specific project' websites; requires giving non-GSC board members access to gensc website
- 2 websites: gensc, mixs project (with MIxS reference within)
- Pros: keeps all MIxS documentation in one location (only have to search one place); easy to switch and link between tutorial/howtos/reference reduces bloat on GSC website and allows other projects to form under GSC umbrella; better 'security' (mixs contributors can edit MIxS docs without access to gensc website)
- Cons: mixing 'curated' documentation potentially interferes with automation and/or risk breaking
First proposal
My (opinionated) preferred option are scenarios 1 or 2 (in the case of 2, the reference is a subpage of the MIxS project website).
It could be structured as follows:
gensc.org website: pretty much as it currently is but, under projects links to a dedicated MIxS website rather than embedding documentation within it.
.
├── Home
├── About
│ ├── About
│ ├── Governance
│ ├── Board
│ ├── Funding
│ ├── Collaborations
│ └── Publications
├── Projects
│ ├── MIxS
│ ├── Collaborative projects
│ └── Previous projects
├── Events
│ ├── Next meetings
│ ├── All Meetings
│ └── Calendar
└── Contact
A (new) MIxS project website: is filled with 'human curated' documentation, which is mainly for consuming
.
├── Home
├── About
│ ├── About
│ ├── Governance
│ ├── Team
│ ├── Resources (logo pack etc)
│ └── Publications
├── For users
│ ├── Guide to MIxS (e.g. how it is structured, terminology)
│ ├── How to contribute (e.g. how to run a extension project, how to )
│ ├── How to consume (e.g. how to implement in a database)
│ ├── Specifications and guidelines (e.g. what attributes does a MIxS term require)
│ ├── CIG Meeting notes
│ └── Reference (link to reference page)
├── For developers
│ ├── How to get involved (e.g. who to ask, when we meet, how to )
│ ├── Tutorials (e.g. how to do a particular routine task)
│ ├── ADRs, specifications, and guidelines (e.g. how a new class is defined, what LinkML specifications we follow etc)
│ └── TWG Meeting notes
└── Contact
MIxS reference: as is
.
├── Index
├── Full term table
├── Combinations
└── Enumerations
Let's discuss this at GSC25 WG session.
Another thought, could further split up into:
- For developers (primarily for the TWG)
- For contributors (primarily for the CIG)
- For users (for scientists who have to fill in metadata with MIxS checklists)
- For implementers (who implement or derive MIxS objects in their own custom schemas or databases, e.g. the INSDC)