ConditionalMapping is not flexible enough
ConditionalMapping checks a sequence of Conditions and the first satisfied one wins. The Conditions themselves are atomic (no connectives). But this is not flexible enough.
Consider http://mappings.dbpedia.org/index.php/Mapping_bg:Музикален_изпълнител:
- We need to handle both type and gender, based on complex rules. Eg
- type: фон=певец -> MusicalArtist, фон=композитор -> MusicComposer, фон=диригент -> MusicDirector, фон=състав -> Band
- gender: фон=певец -> Male, фон=певица -> Female, наставка isSet->Female, фон=състав -> none, otherwise -> Male
We cannot always sequence such clauses in a linear way.
- Many people have more than one type, eg
- Gustav Maler: both композитор=MusicComposer and диригент=MusicDirector, currently we catch only the first one
- Lindsay Lohan: both actress, singer, model. bg.wikipedia used template Actor, now MusicalArtist.
Maybe a Condition field "continue" can serve this problem: if the condition succeeds, apply its mapping, but continue to the next condition.
- Another problem is that the syntax is quite unreadable, so if we start writing more complex conditions, it'll be very hard to verify they are correct
I don't know what is the best way to fix the ConditionalMapping-Condition structure. Maybe it should be "Redesigned" based on some literature for Condition-Action rules.
Another example is Elvis Presley: "occupation = Singer, actor" in #341
Hi VladimirAlexiev I have start working on this issue can u sugget me a wiki dump to test this issue
you can adjust the saved mappings in the github repo. It would be much easier to work with the files than a complete wiki.
each file is a complete mapping dump of a language
Great @pathi108 ! Is there some design document on how to extend ConditionalMapping?
Hi I fixed this problem but how should I commit the changes
I send a new pull request after fixing this bug
https://github.com/dbpedia/extraction-framework/pull/443
Hi I fixed the above bug and send a new pull request https://github.com/dbpedia/extraction-framework/pull/443 all tests was passed succesfull does my code fix your bug
@pathi108, @jimkont I'm not familiar with the unit testing used by the framework. But would it be possible to add some test cases as listed above; and on http://mappings.dbpedia.org/index.php/Template:ConditionalMapping#Example_of_Mapping_Gender?
Reading the PR comment, it seems the new design is: merge triples produced by each matching condition
So the logic is changed from completely sequential to completely parallel (additive).
- This is incorrect about the "otherwise" condition, but I guess that is handled specially
- But are we sure that all/most ConditionalMappings will be happy with this additive logic? There are about 200 uses of CondiionalMapping: http://mappings.dbpedia.org/index.php?title=Special:WhatLinksHere/Template:ConditionalMapping&limit=100. I checked geobox, artist, comedian, company, mountain and they will be ok. Please check some more.
http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_artist uses casing on dbp:occupation. Here's the variety of values Wikipedia editors have put in that templateProperty:
select distinct * {
[] dbp:occupation ?o
filter(isLiteral(?o))
}
Please document the new design at http://mappings.dbpedia.org/index.php/Template:ConditionalMapping, including examples of additive firing of Conditions (e.g. one instance gets several types)
@VladimirAlexiev we did not yet approve the PR as this is mostly a design decision than coding. So the main question is: Do we want to allow this additive behavior in the mappings?
I cannot give a straight answer without thorough testing of the existing mappings (means running a full extraction with both options and creating a diff).
This is my fault but when @pathi108 opted for this issue I had another issue in mind (https://github.com/dbpedia/extraction-framework/issues/19) and that is why I got a little surprised from his PR (which I admit was inline with this issue description)
After looking at several mappings, I think the additive behavior is right. But a complete diff sounds like a good idea; IF enough effort can be spent to analyze it
@pathi108 do you have time to give this a try?
A simple approach is to run this in English once for each variation, using only the MappingsExtractor.
then for all the files that are generated we sort them with sort -u and diff them with the second run using diff. iirc there will be the following files / diffs
- instance_types
- instance_types_transitive
- mappingbased_objects
- mappingbased_literals
- geo_mappingbased