Export formats
As part of KBase's work with ontologies in semsql, it would be very handy to be able to export as DSV directly, rather than going via the sqlite database.
We could put together a file similar to build.Makefile to perform the conversions, but I am wary of changes being made in the semsql sqlite build process that we'd have to ensure that we kept up with. I was therefore wondering about adapting the current build command to export either DSV or sqlite, depending on what args it is given.
build.Makefile line 51 onwards:
# -- MAIN TARGET --
# A db is constructed from
# (1) triples loaded using rdftab
# (2) A relation-graph TSV
%.db: %.owl %-$(RGSUFFIX).tsv $(PREFIX_CSV_PATH)
rm -f [email protected] && \
cat $(THIS_DIR)/sql_schema/semsql.sql | sqlite3 [email protected] && \
echo .exit | sqlite3 -echo [email protected] -cmd ".mode csv" -cmd ".import $(PREFIX_CSV_PATH) prefix" && \
rdftab [email protected] < $< && \
sqlite3 [email protected] -cmd '.separator "\t"' ".import $*-$(RGSUFFIX).tsv entailed_edge" && \
gzip -f $*-$(RGSUFFIX).tsv && \
cat $(THIS_DIR)/indexes/*.sql | sqlite3 [email protected] && \
echo "ALTER TABLE statements ADD COLUMN graph TEXT;" | sqlite3 [email protected] && \
(test -d views && find views -maxdepth 1 -name '$(notdir $*)*.sql' -type f -print0 | xargs -0 -I{} sh -c 'sqlite3 [email protected]< "$$1"' sh {} || echo no views ) && \
mv [email protected] $@
.PRECIOUS: %.db
Alter this so that if the desired output is DSV, the rdftab command is run but the sqlite db loading is omitted. Relevant files, including prefixes, relation graph, and schema SQL could be moved to a directory together or something like that.
I would recommend doing export to DSV after building sqlite. This is safer because in theory there could be post-processing logic on the build. For example, when chebi is built, we generate additional views. If you are not deriving DSVs from those views, then this is perhaps less relevant.
I'm guessing this route is less attractive as if you bypass sqlite you can speed up the build and reduce dependencies? This is fair.
Of course, things would be more composable at the Python level, to separate atomic operations:
- prefix loading
- turning RDF/OWL to tables
- RG closures
- exporting to duckdb/sqlite/pg/parquet/dsv
Any thoughts on approaches here? dbt? keep it simple, pure python?
but as an interim step adding to the hairy Makefile may be easiest...
On Fri, Feb 14, 2025 at 9:15 AM i alarmed alien @.***> wrote:
As part of KBase's work with ontologies in semsql, it would be very handy to be able to export as DSV directly, rather than going via the sqlite database.
We could put together a file similar to build.Makefile to perform the conversions, but I am wary of changes being made in the semsql sqlite build process that we'd have to ensure that we kept up with. I was therefore wondering about adapting the current build command to export either DSV or sqlite, depending on what args it is given.
build.Makefile http://src/semsql/builder/build.Makefile line 51 onwards:
-- MAIN TARGET --
A db is constructed from
(1) triples loaded using rdftab
(2) A relation-graph TSV
%.db: %.owl %-$(RGSUFFIX).tsv $(PREFIX_CSV_PATH) rm -f @.*** &&
cat $(THIS_DIR)/sql_schema/semsql.sql | sqlite3 @.*** &&
echo .exit | sqlite3 -echo @.*** -cmd ".mode csv" -cmd ".import $(PREFIX_CSV_PATH) prefix" &&
rdftab @.*** < $< &&
sqlite3 @.*** -cmd '.separator "\t"' ".import $-$(RGSUFFIX).tsv entailed_edge" &&
gzip -f $-$(RGSUFFIX).tsv &&
cat $(THIS_DIR)/indexes/.sql | sqlite3 @.** &&
echo "ALTER TABLE statements ADD COLUMN graph TEXT;" | sqlite3 @.*** &&
(test -d views && find views -maxdepth 1 -name '$(notdir $).sql' -type f -print0 | xargs -0 -I{} sh -c 'sqlite3 @.< "$$1"' sh {} || echo no views ) &&
mv @. $@ .PRECIOUS: %.dbAlter this so that if the desired output is DSV, the rdftab command is run but the sqlite db loading is omitted. Relevant files, including prefixes, relation graph, and schema SQL could be moved to a directory together or something like that.
— Reply to this email directly, view it on GitHub https://github.com/INCATools/semantic-sql/issues/96, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOOK72ZI4OK2Q5AAFX32PYQDFAVCNFSM6AAAAABXFCWIRSVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2TIMZSGU2TCMA . You are receiving this because you are subscribed to this thread.Message ID: @.***> [image: ialarmedalien]ialarmedalien created an issue (INCATools/semantic-sql#96) https://github.com/INCATools/semantic-sql/issues/96
As part of KBase's work with ontologies in semsql, it would be very handy to be able to export as DSV directly, rather than going via the sqlite database.
We could put together a file similar to build.Makefile to perform the conversions, but I am wary of changes being made in the semsql sqlite build process that we'd have to ensure that we kept up with. I was therefore wondering about adapting the current build command to export either DSV or sqlite, depending on what args it is given.
build.Makefile http://src/semsql/builder/build.Makefile line 51 onwards:
-- MAIN TARGET --
A db is constructed from
(1) triples loaded using rdftab
(2) A relation-graph TSV
%.db: %.owl %-$(RGSUFFIX).tsv $(PREFIX_CSV_PATH) rm -f @.*** &&
cat $(THIS_DIR)/sql_schema/semsql.sql | sqlite3 @.*** &&
echo .exit | sqlite3 -echo @.*** -cmd ".mode csv" -cmd ".import $(PREFIX_CSV_PATH) prefix" &&
rdftab @.*** < $< &&
sqlite3 @.*** -cmd '.separator "\t"' ".import $-$(RGSUFFIX).tsv entailed_edge" &&
gzip -f $-$(RGSUFFIX).tsv &&
cat $(THIS_DIR)/indexes/.sql | sqlite3 @.** &&
echo "ALTER TABLE statements ADD COLUMN graph TEXT;" | sqlite3 @.*** &&
(test -d views && find views -maxdepth 1 -name '$(notdir $).sql' -type f -print0 | xargs -0 -I{} sh -c 'sqlite3 @.< "$$1"' sh {} || echo no views ) &&
mv @. $@ .PRECIOUS: %.dbAlter this so that if the desired output is DSV, the rdftab command is run but the sqlite db loading is omitted. Relevant files, including prefixes, relation graph, and schema SQL could be moved to a directory together or something like that.
— Reply to this email directly, view it on GitHub https://github.com/INCATools/semantic-sql/issues/96, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOOK72ZI4OK2Q5AAFX32PYQDFAVCNFSM6AAAAABXFCWIRSVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2TIMZSGU2TCMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
At the moment (and in the future), we load the semsql schema (tables+views) into the target DB, run the ontology merge/sqlite db generation code, export from sqlite as DSV and into the target DB. It's true that in the grand scheme of things, the sqlite load/export isn't particularly time-consuming, especially in comparison with the previous steps (an hour vs 9-10 hrs for the ontology build), but it would be nice to skip it and go directly to the target DB.
I think for now, the KISS / hairy Makefile approach is best, since I don't know how many others are using the outputs of semsql as something other than a sqlite db.
On that topic, we're currently using a MySQL db and will be using an ANSI SQL interface, and there are a few edits that could be made to the current base semsql schema to make it more compliant with these other SQL syntaxes; see #95 for suggested initial tweaks.
It's out of the scope of semsql to provide the schema in every SQL dialect under the sun, but the suggested changes do make it easy enough to fix other inconsistencies with a regex. I'm investigating SQL dialect translators / transpilers so could add a recommendation for schema conversion if other people want to use semsql with a different DB engine.