neosemantics
neosemantics copied to clipboard
Problem with literal with a language tag
For example we import data with language tags:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix show: <http://example.org/vocab/show/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
show:218 show:localName "That Seventies Show"@en . # literal with a language tag
show:218 show:localName 'Cette Série des Années Soixante-dix'@fr . # literal delimited by single quote
show:218 show:localName "Cette Série des Années Septante"@fr-be . # literal with a region subtag
https://www.w3.org/TR/turtle/#turtle-literals
The result of semantics.importRDF:
- terminationStatus: OK
- triplesLoaded: 3
But MATCH (P:Resource) RETURN P return only one node with lost of information about language labels:
{
"http://example.org/vocab/show/localName":"Cette Série des Années Septante",
"uri":"http://example.org/vocab/show/218"
}
Hey, thanks for bringing that up. It's a known issue :) I mean, dealing with multivalued properties in general. See the open points section at the end of https://jesusbarrasa.wordpress.com/2016/06/07/importing-rdf-data-into-neo4j/
I tried to stay away from it in the first version to make things simpler. The thing is multivalued properties are either not used at all or used in anger. I mean, there are datasets where there is not a single case and others where multilinguality (as your example suggest) is central.
The way you would typically model this in a Property Graph would be by using arrays. Although in the case of language tagged values maybe prefixing property names could also be an option -that I don't particularly like by the way-. Here's what I mean:
(:Resource { uri: 'ns0_218', ns0_en_localName: 'That Seventies Show', ns0_fr_localName: 'Cette Série des Années Soixante-dix', ns0_fr_be_localName: 'Cette Série des Années Septante'})
Back to multivalued properties in general:
Option 1: all properties have array values. This means every time you want to get the value of a property you have to deal with the fact it's an array. Option 2: Only multivalued properties are arrays, single valued ones are atomic values. I'd discard this option straight away since there would not be a standard way to access property values. You'd need to check if it's an array or not and depending on the case access in one way or another. Option 3: keep a preferred value that you can default to and the complete list (including the default one) in an array. Something like this:
(:Resource { uri: 'ns0_218', ns0_localName: 'That Seventies Show', ns0_localName_all: [ 'That Seventies Show@en', 'Cette Série des Années Soixante-dix@fr', 'Cette Série des Années Septante@fr-be']})
I think probably 1 is the more reasonable, but would love to hear your thoughts on this.
I think it would make sense to make this a data load option so if your dataset does not use multivalued properties you can stick to the current approach and benefit from the simplicity. I suspect many cases will fall in this category.
I've started to play with this but before I commit code to the project I'd like to hear opinions.
Is Option 1 the right approach?
Cheers,
JB.
@jbarrasa Hi!
I think that the first option is quite suitable.
But what do you think if you take the multilingual properties to a separate node? Something like that:
(:Resource { uri: 'ns0_218' }) - [:hasProperty] -> (:Property {
uri: 'http://example.org/vocab/show/',
name: 'localName',
'en': 'That Seventies Show',
'fr': 'Cette Série des Années Soixante-dix',
'fr-be': 'Cette Série des Années Septante'
})
Thanks stdob! Yes, I guess that's an option too but that kind of defeats the purpose of the property graph model where you can set properties to nodes. This approach would create a node for every value of every property which is the atomic decomposition the RDF imposes.
Definitely a possibility but not my favourite.
Also if you think of the syntax for querying this graph...
MATCH (:Resource { uri: 'ns0_218' }) - [:hasProperty] -> (p:Property { name: 'aproperty', uri: 'xxx'})
RETURN p.en AS value
As opposed to something a lot more compact like
MATCH (r:Resource { uri: 'ns0_218' })
RETURN r.nsx_aproperty_en AS value
But in some cases it may be the best option... let's think about it.
JB.
Good discussion. @jbarrasa I agree with you that wherever possible we should leverage the property graph model fully. That is why we, at least I, am here.
The question on multivalued properties comes down to the use case. Is there ever a case where having a separate node is beneficial, will the node carry additional information beyond the name? In my own examples option 3 would work most of the time and allows the preferred term to be easily indexed, option 1 provides a more consistent interface but takes away the option of an index for performance.
Of course you could always allow a choice configurable, defaulted to leverage the property graph but with the option for separate nodes. This type of option would be model driven, no field by field choice, to avoid complexity.
This should also support property graph to RDF in consistent manner too.
Hi, @jbarrasa!
I tend to choose when all properties are arrays. Only it seems to me, that it is necessary to wrap multilanguage-string in double quotes:
(:Resource {
uri: 'ns0_218',
ns0_localName: [ '"That Seventies Show"@en',
'"Cette Série des Années Soixante-dix"@fr',
'"Cette Série des Années Septante"@fr-be'
]
})
Hi @jbarrasa. I try importing data from a multi-language source (i.e. wikidata) and encountered this issue :(.
As I currently see the plugin saves a value of the last encountered language.
Do you have any progress on this issue?
Hi Kirill, I’m working on the next release where I plan to solve this problem. It would be extremely useful though to know what your dataset and some of your multilingual queries look like. Would that be possible?
Thanks!
JB
From: Kirill Yankov [email protected] Sent: Friday, April 19, 2019 1:19:32 AM To: jbarrasa/neosemantics Cc: J. Barrasa; Mention Subject: Re: [jbarrasa/neosemantics] Problem with literal with a language tag (#18)
Hi @jbarrasahttps://github.com/jbarrasa. I try importing data from a multi-language source (i.e. wikidata) and encountered this issue :(.
As I currently see the plugin saves a value of the last encountered language.
Do you have any progress on this issue?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jbarrasa/neosemantics/issues/18#issuecomment-484729975, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACMGYF62GW3EE6MPKUYRPM3PREFZJANCNFSM4DCKSIBQ.
Thanks for good news )!
I work with knowledge bases. I research on neo4j performance and try to import the full (or at least some part) of wikidata in it. Wikidata is in many languages, so the queries look like: return all scientists from Germany (in english or any other language, in most cases you do not want to mix the languages). Essentially we have many entities linked between each other, each of those entities has the same property on many languages (i.e. name of the scientist written on different languages).
Quick announcement: Changes for the first version of neo4j with multivalued properties + language info handling have been committed to the 3.5 branch. New release to follow in the next few days.
Great news!
Hey @jbarrasa! How is your progress on the release? Are you still planning to issue it?
Just pushed it. Still working on the doc for new features though... should be up in the next 48h. In the meantime you can have a look at the unit tests :)
Great! Thank you for the work and support of this project!