SedLex icon indicating copy to clipboard operation
SedLex copied to clipboard

Replace every occurrence of searched words

Open Seb35 opened this issue 7 years ago • 1 comments

This is a quite simple issue and could be a good first bug for a newcomer in SedLex codebase.

When diffs are generated by SedLex (in AddDiffVisitor), when DuraLex tree says to replace a word (well an expression possibly with multiple words) by another, only the first one is replaced. The perimeter is currently delimited by (self.begin, self.end), this should be changed into a list of perimeters and then do the edit operation in each perimeter. Some care should be taken because the text should not be reset between each sub-edit operation, particularly for exact diffs.

This can be tested on Durafront - it can be checked that the DuraLex tree is correct. Amendment =

Le mot "truc" est remplacé par le mot "machin".

Text to be amended:

Le truc est ici. Le truc n'est pas ici.

Currently only the first "truc" is changed to "machin".

When this will be implemented, the diff of the above example will look like:

--- "unnamed article"
+++ "unnamed article"
@@ -1 +1 @@
-Le truc est ici. Le truc n'est pas ici.
+Le machin est ici. Le machin n'est pas ici.

and the exact diff will look like:

--- "unnamed article"
+++ "unnamed article"
@@ -4,4 +4,6 @@
-truc
+machin
@@ -21,4 +21,6 @@
-truc
+machin

Seb35 avatar Dec 22 '18 21:12 Seb35

Removed the 'easy' tag because possibly it could have links with the way issue #9 of DuraLex is solved. This other issue is about how multiple articles are changed together, and this one is about how multiple occurences are managed in a single article, but I think the data model in AddDiffVisitor should be rewritten by taking into account both characteristics.

A complete example is an amendment:

Au premier alinéa de l’article 3 et au cinquième alinéa de l’article 5,
les mots "truc" sont remplacés par les mots "machin".

And the word "truc" appears multiple times in both locations specified in the articles. (According to the rules in the French assemblies, an amendment in the classical sense cannot change multiple articles, but here the “amendment” word is used as a synonym of “modifying text” and this type of amendment can be found in law projects/proposals or in in-force laws modifying other laws.)

The DuraLex tree of such an amendment would be something like:

{
  "children": [
    {
      "children": [
        {
          "children": [
            {
              "children": [
                {
                  "type": "quote",
                  "words": "truc"
                }
              ],
              "type": "word-reference"
            }
          ],
          "order": 1,
          "type": "alinea-reference"
        }
      ],
      "id": "3",
      "type": "article-reference"
    },
    {
      "children": [
        {
          "children": [
            {
              "children": [
                {
                  "type": "quote",
                  "words": "truc"
                }
              ],
              "type": "word-reference"
            }
          ],
          "order": 5,
          "type": "alinea-reference"
        }
      ],
      "id": "5",
      "type": "article-reference"
    },
    {
      "children": [
        {
          "type": "quote",
          "words": "machin"
        }
      ],
      "type": "word-definition"
    }
  ],
  "editType": "replace",
  "type": "edit"
}

Seb35 avatar Dec 29 '18 15:12 Seb35