sdrf-pipelines icon indicating copy to clipboard operation
sdrf-pipelines copied to clipboard

Proposal for refactoring & higher maintainability

Open di-hardt opened this issue 1 year ago • 6 comments

Hey,

I noticed the SDRF validator is a bit difficult to maintain and to understand in it's current state, as it is one long python script. To make it more maintainable and extendable for future changes and integration, I would like to propose a refactoring and at the same time the integration of a data validation framework like pydantic.

Pydantic basically adds validator to class attributes via so called Field-objects. While it already supports basic validator like decimal and length constraints there is the option to implement custom ones, e.g. checking validity of ontology terms.

There are multiple options how to implement this. My suggestions is to create one pydantic model for each SDRF template record (e.g. cell-line, human, plant, ...) (record == line in SDRF).

Pytandic already supports JSON which would play along with the plans of a JSON representation described in https://github.com/bigbio/proteomics-sample-metadata/issues/696
I already looked into the possibility to read automatically parse CSV/TSVs.

A structure like this can also be easily tested, as each validator can be tested by itself, as well as in combination with others in one of the record types.

A possible structure could look like this

|
|-- records
|   |-- cell_line_record.py
|   |-- ...
|-- validators
|   |-- onthology_term_validator.py
|   |-- ...
|-- cli
|-- tests
|-- ...

One additional advantage would be the increased expressiveness of error messages as validators will give an exact error message per attribute. Which can than be easily tackled by the SDRF creator or in case of new implementation, by the developer.

The modularity would also allow other projects to include only parts of the validation process, e.g. sdrf_convert need only parts of the validation for variables used by the targeted software e.g. Comet, DIA-NN, ...

di-hardt avatar Jan 29 '24 16:01 di-hardt