koza
koza copied to clipboard
Data transformation framework for LinkML data models
Closes #132 and #134 This is a fairly large one. Several updates and changes, some needed in general, some to make it play more nicely with [cookiecutter for ingests](https://github.com/monarch-initiative/cookiecutter-monarch-ingest) -...
We could use `duckdb` for this. An example, as used in the cookiecutter-monarch-ingest: ```python from pathlib import Path import duckdb nodes_file = "output/{{ cookiecutter.__ingest_name }}_nodes.tsv" edges_file = "output/{{ cookiecutter.__ingest_name }}_edges.tsv"...
Currently, the `metadata.yaml` file (properties defined in `koza/model/config/source_config.py`) is not being used in any way other than documentation, and has fields with misleading names relative to how they're being used:...
RE: https://github.com/monarch-initiative/monarch-ingest/issues/585 Koza will likely need some changes in preparation for being used in standalone ingests for the Monarch KG.
Currently varying line lengths will raise an exception and hard fail. We should probably make these warnings and report out which lines vary. Could also create a custom exception and...
https://github.com/monarch-initiative/koza/blob/c08b69bce43b339002e19b13ee7e987a1d715faf/src/koza/model/curie_cleaner.py#L6 @kevinschaper may have more insight
https://github.com/monarch-initiative/koza/blob/c08b69bce43b339002e19b13ee7e987a1d715faf/src/koza/model/source.py#L89
Koza should be able to take a yaml or json file as a map and just make it available as a dictionary without any special configuration.
This may still be of use. Or could be made redundant by the current planned work on creating the monarch-ingest cookiecutter template.