extraction-framework icon indicating copy to clipboard operation
extraction-framework copied to clipboard

ConditionalMapping is not flexible enough

Open VladimirAlexiev opened this issue 10 years ago • 13 comments

ConditionalMapping checks a sequence of Conditions and the first satisfied one wins. The Conditions themselves are atomic (no connectives). But this is not flexible enough.

Consider http://mappings.dbpedia.org/index.php/Mapping_bg:Музикален_изпълнител:

  • We need to handle both type and gender, based on complex rules. Eg
    • type: фон=певец -> MusicalArtist, фон=композитор -> MusicComposer, фон=диригент -> MusicDirector, фон=състав -> Band
    • gender: фон=певец -> Male, фон=певица -> Female, наставка isSet->Female, фон=състав -> none, otherwise -> Male

We cannot always sequence such clauses in a linear way.

  • Many people have more than one type, eg
    • Gustav Maler: both композитор=MusicComposer and диригент=MusicDirector, currently we catch only the first one
    • Lindsay Lohan: both actress, singer, model. bg.wikipedia used template Actor, now MusicalArtist.

Maybe a Condition field "continue" can serve this problem: if the condition succeeds, apply its mapping, but continue to the next condition.

  • Another problem is that the syntax is quite unreadable, so if we start writing more complex conditions, it'll be very hard to verify they are correct

I don't know what is the best way to fix the ConditionalMapping-Condition structure. Maybe it should be "Redesigned" based on some literature for Condition-Action rules.

VladimirAlexiev avatar Jan 16 '15 11:01 VladimirAlexiev

Another example is Elvis Presley: "occupation = Singer, actor" in #341

VladimirAlexiev avatar Feb 18 '15 13:02 VladimirAlexiev

Hi VladimirAlexiev I have start working on this issue can u sugget me a wiki dump to test this issue

pathi108 avatar Mar 14 '16 07:03 pathi108

you can adjust the saved mappings in the github repo. It would be much easier to work with the files than a complete wiki.

each file is a complete mapping dump of a language

jimkont avatar Mar 14 '16 14:03 jimkont

Great @pathi108 ! Is there some design document on how to extend ConditionalMapping?

VladimirAlexiev avatar Mar 15 '16 11:03 VladimirAlexiev

Hi I fixed this problem but how should I commit the changes

pathi108 avatar Mar 20 '16 05:03 pathi108

I send a new pull request after fixing this bug

pathi108 avatar Mar 21 '16 01:03 pathi108

https://github.com/dbpedia/extraction-framework/pull/443

pathi108 avatar Mar 21 '16 01:03 pathi108

Hi I fixed the above bug and send a new pull request https://github.com/dbpedia/extraction-framework/pull/443 all tests was passed succesfull does my code fix your bug

pathi108 avatar Mar 21 '16 04:03 pathi108

@pathi108, @jimkont I'm not familiar with the unit testing used by the framework. But would it be possible to add some test cases as listed above; and on http://mappings.dbpedia.org/index.php/Template:ConditionalMapping#Example_of_Mapping_Gender?


Reading the PR comment, it seems the new design is: merge triples produced by each matching condition

So the logic is changed from completely sequential to completely parallel (additive).

  • This is incorrect about the "otherwise" condition, but I guess that is handled specially
  • But are we sure that all/most ConditionalMappings will be happy with this additive logic? There are about 200 uses of CondiionalMapping: http://mappings.dbpedia.org/index.php?title=Special:WhatLinksHere/Template:ConditionalMapping&limit=100. I checked geobox, artist, comedian, company, mountain and they will be ok. Please check some more.

http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_artist uses casing on dbp:occupation. Here's the variety of values Wikipedia editors have put in that templateProperty:

select distinct * {
  [] dbp:occupation ?o
  filter(isLiteral(?o))
}

VladimirAlexiev avatar Mar 24 '16 06:03 VladimirAlexiev

Please document the new design at http://mappings.dbpedia.org/index.php/Template:ConditionalMapping, including examples of additive firing of Conditions (e.g. one instance gets several types)

VladimirAlexiev avatar Mar 24 '16 06:03 VladimirAlexiev

@VladimirAlexiev we did not yet approve the PR as this is mostly a design decision than coding. So the main question is: Do we want to allow this additive behavior in the mappings?

I cannot give a straight answer without thorough testing of the existing mappings (means running a full extraction with both options and creating a diff).

This is my fault but when @pathi108 opted for this issue I had another issue in mind (https://github.com/dbpedia/extraction-framework/issues/19) and that is why I got a little surprised from his PR (which I admit was inline with this issue description)

jimkont avatar Mar 28 '16 07:03 jimkont

After looking at several mappings, I think the additive behavior is right. But a complete diff sounds like a good idea; IF enough effort can be spent to analyze it

VladimirAlexiev avatar Mar 28 '16 11:03 VladimirAlexiev

@pathi108 do you have time to give this a try? A simple approach is to run this in English once for each variation, using only the MappingsExtractor.

then for all the files that are generated we sort them with sort -u and diff them with the second run using diff. iirc there will be the following files / diffs

  • instance_types
  • instance_types_transitive
  • mappingbased_objects
  • mappingbased_literals
  • geo_mappingbased

jimkont avatar Mar 29 '16 14:03 jimkont