spark-dgraph-connector
spark-dgraph-connector copied to clipboard
Add language support for wide node mode
The node source in wide mode has a column for each predicate. With language strings, each language of each predicate requires its own column, which needs to be known upfront. Configuration could provide a set of languages for those predicates, but this is not very handy. Zero service should tell us for each predicate, which languages exist for predicates with @lang
directive. From this, we can derive the output DataFrame schema and configure the encoder.
Asked in the dgraph forum for a feature that would support knowing existing languages per predicate upfront: https://discuss.dgraph.io/t/list-of-existing-languages-per-predicate/11479
Alternatively, the type of a column with @lang
in wide mode could be a Map[String, T]
where the key is the language tag mapping to the respective value of type T
. This would not require any upfront information on existing languages, and the table does not explode in width and get sparse.