Pogues icon indicating copy to clipboard operation
Pogues copied to clipboard

Suggester - DDI

Open romaintailhurat opened this issue 1 year ago • 7 comments

We devised a first implementation of the suggester component in DDI.

Proposal

We slightly modify the d:QuestionItem/d:CodeDomain for a single answer question with:

<r:GenericOutputFormat controlledVocabularyID="INSEE-GOF-CV">suggester</r:GenericOutputFormat>
<r:CodeListReference isExternal="true">                  
    <r:URN>urn:ddi:fr.insee:communes-2023:1</r:URN>
    <r:TypeOfObject>CodeList</r:TypeOfObject>
    <r:UserAttributePair>
        <r:AttributeKey>SuggesterConfiguration</r:AttributeKey>
        <r:AttributeValue>{ "queryParser": { "type": "tokenized", "params": { "language": "French", "pattern": "[\\w]+", "min": "1" } }, "stopWords": ["de", "la", "les", "du", "et", "au", "aux", "en"], "max": 12 }
</r:AttributeValue>
    </r:UserAttributePair>
</r:CodeListReference>

List of changes

  1. We add a value to the controlled vocabulary of the r:GenericOutputFormat element: suggester.
  2. We add the attribute isExternal=true to r:CodeListReference
  3. The r:CodeListReference use a single r:URN which uniquely identify the code list
  4. Finally, we make use of a UserAttributePair inside the r:CodeListReference to pass the suggester configuration to Eno. The value here (inside r:AttributeValue) holds the JSON snippet used by the Suggester component in Lunatic.

romaintailhurat avatar May 25 '23 12:05 romaintailhurat

Add for the "multiple" suggester : In Lunatic, it may fill several variables at a time.

The nomenclature has more than 2 columns. When getting the variable containing the code, we get variables from the other columns. In fact, these variables are calculated from the collected one, but Lunatic optimizes this calculation by "collecting" them.

They would be designed by a calculated variable which d:GenerationInstruction would be :

left_join(aaa, bbb using ccc, ddd)

where :

  • aaa is the name of the variable collected with the simple suggester
  • bbb is the name of the nomenclature used for the suggester
  • ccc is the name of the column containing the id of the nomenclature
  • ddd is the name of the column containing the value of the calculated variable

Example :

  • initial collected variable : "birth-country"
  • nomenclature : "Countries" with 3 columns : id, label, continent

Formula for the calculated variable "birth-continent" : left_join(birth-country, Countries using id, continent)

BulotF avatar Jul 11 '23 15:07 BulotF

Alternative proposal : The only evolution in the d:QuestionItem/d:CodeDomain is the r:GenericOutputFormat "suggester". It refers to a codelist inside the questionnaire.

This codelist :

  • contains no code
  • refers to the external codelist with its URN
  • contains the suggester parameters

If several responses use the same codelist, the suggester parameters are pooled.

<r:GenericOutputFormat controlledVocabularyID="INSEE-GOF-CV">suggester</r:GenericOutputFormat>
<r:CodeListReference>
    <r:Agency>fr.insee</r:Agency>
    <r:ID>j334iumu</r:ID>
    <r:Version>1</r:Version>
    <r:TypeOfObject>CodeList</r:TypeOfObject>
</r:CodeListReference>

and

<l:CodeList>
    <r:Agency>fr.insee</r:Agency>
    <r:ID>j334iumu</r:ID>
    <r:Version>1</r:Version>
    <r:Label>
        <r:Content xml:lang="fr-FR">communes-2023</r:Content>
    </r:Label>
    <l:HierarchyType>Regular</l:HierarchyType>
    <l:Level levelNumber="1">
        <l:CategoryRelationship>Ordinal</l:CategoryRelationship>
    </l:Level>
    <r:CodeListReference isExternal="true">                  
        <r:URN>urn:ddi:fr.insee:communes-2023:1</r:URN>
        <r:TypeOfObject>CodeList</r:TypeOfObject>
    </r:CodeListReference>
    <r:UserAttributePair>
        <r:AttributeKey>SuggesterConfiguration</r:AttributeKey>
        <r:AttributeValue>{ "queryParser": { "type": "tokenized", "params": { "language": "French", "pattern": "[\\w]+", "min": "1" } }, "stopWords": ["de", "la", "les", "du", "et", "au", "aux", "en"], "max": 12 }
        </r:AttributeValue>
    </r:UserAttributePair>
</l:CodeList>

BulotF avatar Oct 19 '23 15:10 BulotF

@BulotF add the true implementation of the <r:AttributeKey>SuggesterConfiguration</r:AttributeKey> value (it is an XML payload instead of a JSON).

romaintailhurat avatar Nov 09 '23 13:11 romaintailhurat

We'll make use of some r:UserID for identification. Will be documented.

romaintailhurat avatar Nov 09 '23 13:11 romaintailhurat

Implementation, before changing l:CodeListName with r:UserID :

<l:CodeList>
   <r:Agency>fr.insee</r:Agency>
   <r:ID>j334iumu</r:ID>
   <r:Version>1</r:Version>
   <r:UserAttributePair>
      <r:AttributeKey>SuggesterConfiguration</r:AttributeKey>
      <r:AttributeValue><![CDATA[<fields xmlns="http://xml.insee.fr/schema/applis/lunatic-h">
<name>id</name>
<rules>soft</rules>
</fields>
<queryParser xmlns="http://xml.insee.fr/schema/applis/lunatic-h">
<type>soft</type>
</queryParser>]]></r:AttributeValue>
   </r:UserAttributePair>
   <l:CodeListName>
      <r:String xml:lang="fr-FR">in-error</r:String>
   </l:CodeListName>
   <r:Label>
      <r:Content xml:lang="fr-FR">nomenclature in-error</r:Content>
   </r:Label>
   <r:CodeListReference isExternal="true">
      <r:URN>urn:ddi:fr.insee:f7cbc001-29c7-482f-98ed-9121246db5a2:1</r:URN>
      <r:TypeOfObject>CodeList</r:TypeOfObject>
   </r:CodeListReference>
   <l:HierarchyType>Regular</l:HierarchyType>
   <l:Level levelNumber="1">
      <l:CategoryRelationship>Ordinal</l:CategoryRelationship>
   </l:Level>
</l:CodeList>

BulotF avatar Nov 14 '23 15:11 BulotF

Below is another modeling proposal.

Changes are:

  • suggester parameters are set in the CodeListReference in the CodeDomain and not in the CodeList
  • two userIDs are added to the codeList (how to value them remains to be studied). These UserIDs are used to match the code lists for collection
  • apart from the UserIDs, the codeList becomes a classic codeList:
    • without suggester parameters
    • without a CodeListReference inside
<?xml version="1.0" encoding="utf-8"?>
<ddi:FragmentInstance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ddi="ddi:instance:3_3" xmlns:r="ddi:reusable:3_3" xmlns:d="ddi:datacollection:3_3"
    xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:l="ddi:logicalproduct:3_3"
    xsi:schemaLocation="ddi:instance:3_3 https://www.ddialliance.org/Specification/DDI-Lifecycle/3.3/XMLSchema/instance.xsd">
    <ddi:TopLevelReference>
        <r:Agency>fr.insee</r:Agency>
        <r:ID>8af075bd-3a65-4ce1-80d8-18e20cca72cd</r:ID>
        <r:Version>1</r:Version>
        <r:TypeOfObject>QuestionItem</r:TypeOfObject>
    </ddi:TopLevelReference>
    <ddi:Fragment>
        <d:QuestionItem>
            <r:Agency>fr.insee</r:Agency>
            <r:ID>8af075bd-3a65-4ce1-80d8-18e20cca72cd</r:ID>
            <r:Version>1</r:Version>
            <d:QuestionItemName>
                <r:String xml:lang="fr-FR">CITY</r:String>
            </d:QuestionItemName>
            <d:QuestionText>
                <d:LiteralText>
                    <d:Text xml:lang="fr-FR">In which city do the Simpsons reside?</d:Text>
                </d:LiteralText>
            </d:QuestionText>
            <d:CodeDomain>
                <r:GenericOutputFormat controlledVocabularyID="INSEE-GOF-CV">suggester</r:GenericOutputFormat>
                <r:CodeListReference>
                    <r:URN>urn:ddi:fr.insee:8af075bd-3a65-4ce1-80d8-18e20cca72cc:1</r:URN>
                    <r:Agency>fr.insee</r:Agency>
                    <r:ID>8af075bd-3a65-4ce1-80d8-18e20cca72cc</r:ID>
                    <r:Version>1</r:Version>
                    <r:TypeOfObject>CodeList</r:TypeOfObject>
                    <r:UserAttributePair>
                        <r:AttributeKey>SuggesterConfiguration</r:AttributeKey>
                        <r:AttributeValue><![CDATA[<fields xmlns="http://xml.insee.fr/schema/applis/lunatic-h">
<name>id</name>
<rules>soft</rules>
</fields>
<queryParser xmlns="http://xml.insee.fr/schema/applis/lunatic-h">
<type>soft</type>
</queryParser>]]></r:AttributeValue>
                    </r:UserAttributePair>
                </r:CodeListReference>
                <r:ResponseCardinality maximumResponses="1"/>
            </d:CodeDomain>
        </d:QuestionItem>
    </ddi:Fragment>

    <ddi:Fragment>
        <l:CodeList>
            <r:URN>urn:ddi:fr.insee:8af075bd-3a65-4ce1-80d8-18e20cca72cc:1</r:URN>
            <r:Agency>fr.insee</r:Agency>
            <r:ID>8af075bd-3a65-4ce1-80d8-18e20cca72cc</r:ID>
            <r:Version>1</r:Version>
            <!-- Just an idea of value. To study what to put -->
            <r:UserID typeOfUserID="url">https://collecte-api/web/classifications/geo/communes-2023-01-01</r:UserID>
            <r:UserID typeOfUserID="url">https://collecte-api/offline/classifications/geo/communes-2023-01-01</r:UserID>
            <l:CodeListName>
                <r:String xml:lang="fr-FR">COMMUNES-2023-01-01</r:String>
            </l:CodeListName>
            <r:Label>
                <r:Content xml:lang="fr-FR">Liste des communes au 1er janvier 2023</r:Content>
            </r:Label>
            <l:HierarchyType>Regular</l:HierarchyType>
            <l:Level levelNumber="1">
                <l:CategoryRelationship>Ordinal</l:CategoryRelationship>
            </l:Level>
            <l:Code>
                <r:URN>urn:ddi:fr.insee:c6a0f7a1-c7dc-4a5e-a3df-da234057dd22:1</r:URN>
                <r:Agency>fr.insee</r:Agency>
                <r:ID>c6a0f7a1-c7dc-4a5e-a3df-da234057dd22</r:ID>
                <r:Version>1</r:Version>
                <r:CategoryReference>
                    <r:Agency>fr.insee</r:Agency>
                    <r:ID>916505d7-fe17-4e86-b32b-fb6a7783d7ef</r:ID>
                    <r:Version>1</r:Version>
                    <r:TypeOfObject>Category</r:TypeOfObject>
                </r:CategoryReference>
                <r:Value>75000</r:Value>
            </l:Code>
            <!-- etc. -->
        </l:CodeList>
    </ddi:Fragment>
</ddi:FragmentInstance>

ThomasPO avatar Nov 24 '23 14:11 ThomasPO

@BulotF the current implementation is this one: https://github.com/InseeFr/Pogues/issues/682#issuecomment-1810438385 ?

romaintailhurat avatar Mar 11 '24 14:03 romaintailhurat