simphony-osp
simphony-osp copied to clipboard
CUDS to Pandas DataFrames
In GitLab by @yoavnash on Apr 8, 2020, 20:23
For downstream applications, there is a need to export tabular data represented as CUDS objects to Pandas DataFrames and vice versa, namely for ML tasks. How should this be done?
Related to #235 and #258.
Comments:
- Why not use SPARQL since it returns a table as a query result?
- Could this way of action handle multiple large tables of floats?
- Not everyone knows/wants to use SPARQL
- It would be very convenient to have an OSP-core function like so:
df = to_dataframe(dataset)
wheredataset
is a CUDS object. - To support this conversion, then one way to do it would be to map column headers to ontology concepts linked in a certain pattern, and then the rows would be individuals that follow that pattern.
- For efficiency reasons (time and space), it makes sense to store the tabular data as a dataframe, and not as regular CUDS objects. However, the user should be oblivious to this.
In GitLab by @yoavnash on Apr 8, 2020, 20:26
changed the description
In GitLab by @yoavnash on Apr 8, 2020, 20:43
changed the description
References:
- https://github.com/RDFLib/sparqlwrapper/issues/125
- https://sparqlwrapper.readthedocs.io/en/latest/index.html