simphony-osp icon indicating copy to clipboard operation
simphony-osp copied to clipboard

CUDS to Pandas DataFrames

Open pablo-de-andres opened this issue 4 years ago • 3 comments

In GitLab by @yoavnash on Apr 8, 2020, 20:23

For downstream applications, there is a need to export tabular data represented as CUDS objects to Pandas DataFrames and vice versa, namely for ML tasks. How should this be done?

Related to #235 and #258.

Comments:

  1. Why not use SPARQL since it returns a table as a query result?
    • Could this way of action handle multiple large tables of floats?
    • Not everyone knows/wants to use SPARQL
  2. It would be very convenient to have an OSP-core function like so: df = to_dataframe(dataset) where dataset is a CUDS object.
  3. To support this conversion, then one way to do it would be to map column headers to ontology concepts linked in a certain pattern, and then the rows would be individuals that follow that pattern.
  4. For efficiency reasons (time and space), it makes sense to store the tabular data as a dataframe, and not as regular CUDS objects. However, the user should be oblivious to this.

pablo-de-andres avatar Jun 22 '20 14:06 pablo-de-andres

In GitLab by @yoavnash on Apr 8, 2020, 20:26

changed the description

pablo-de-andres avatar Jun 22 '20 14:06 pablo-de-andres

In GitLab by @yoavnash on Apr 8, 2020, 20:43

changed the description

pablo-de-andres avatar Jun 22 '20 14:06 pablo-de-andres

References:

  • https://github.com/RDFLib/sparqlwrapper/issues/125
  • https://sparqlwrapper.readthedocs.io/en/latest/index.html

yoavnash avatar Apr 07 '21 15:04 yoavnash