gdl icon indicating copy to clipboard operation
gdl copied to clipboard

bug in strsplit with /regex keyword and special string

Open brandy125 opened this issue 3 years ago • 8 comments

Anybody can confirm this bug?

GDL> print,strsplit('{B}','{',/regex)
% STRTOK: Error processing regular expression: {
           Invalid preceding regular expression.
% Error occurred at: STRSPLIT            92 strsplit.pro
IDL> print,strsplit('{B}','{',/regex)
           1

brandy125 avatar Dec 20 '22 16:12 brandy125

confirmed :cry:

alaingdl avatar Dec 20 '22 16:12 alaingdl

GDL uses regcomp() (see man regexp) and '{' is an EXTENDED feature of regexp, defining a so-called 'bound'. It is said in the doc that A '{' followed by a character other than a digit is an ordinary character, not the beginning of a bound. Obviously the linux regcomp() has a bug, since it should follow the documentation. On OSX there is no error. I would not change GDL code (and to do what?) if this is a linux library problem.

GillesDuvert avatar Dec 20 '22 16:12 GillesDuvert

several points :

  • I won't agree with a wontfix flag !
  • if I change line 7319 in basic_fun.cpp int cflags = 0;//REG_EXTENDED; the code is working fine in my Linux U22.04
  • the story in IDL STREGEX is amazing : STREGEX is based on the regex package written by Henry Spencer, modified by L3Harris Geospatial Solutions only to the extent required to integrate it into IDL. This package is freely available at: https://garyhouston.github.io/regex/. This should help to easily patch our code

alaingdl avatar Dec 20 '22 23:12 alaingdl

yes cflags = 0 will work in this particular case, because '{' is not recognized as an extension trigger. But IDL uses the extended attributes:

IDL> print,strsplit('{B}','{1}',/regex)                      
% STRTOK: Error processing regular expression: {1}
          repetition-operator operand invalid
% Execution halted at: $MAIN$          
IDL> print,strsplit('{ABBBAAB}','B{1}',/regex)
           0           5           8
IDL> print,strsplit('{ABBBAAB}','B{3}',/regex)
           0           5
GDL> print,strsplit('{B}','{1}',/regex) 
% STRTOK: Error processing regular expression: {1}
           Invalid preceding regular expression.
% Error occurred at: STRSPLIT            92 /usr/local/share/gnudatalanguage/lib/strsplit.pro
%                    $MAIN$          
% Execution halted at: $MAIN$          
GDL> print,strsplit('{ABBBAAB}','B{1}',/regex)
           0           5           8
GDL> print,strsplit('{ABBBAAB}','B{3}',/regex)
           0           5

So cflags=0 will loose an important GDL functionality.

What IDL has changed is unknown, at least we know that GDL works under OSX so it is not a GDL problem at all, and we should report the issue to linux. Using a github 'regexp' library instead of the system's one is probably a bit safer and should be tempted.

GillesDuvert avatar Dec 21 '22 09:12 GillesDuvert

Submitted a bug report to Mageia (my distro) with hope they can have it fixed by glibc guys. (or the documentation modified!)

GillesDuvert avatar Jan 14 '23 17:01 GillesDuvert

@brandy125 you did (I quote distro maintainers) "discover this very obscure fault which requires a specially crafted program to show it?" This is going upstairs, to glibc...

GillesDuvert avatar Jan 17 '23 22:01 GillesDuvert

pushed to glibc bug reports.

GillesDuvert avatar Jan 18 '23 23:01 GillesDuvert

As I do not want to discuss with gnu people of the interpretation of the POSIX bible we'll follow @alaingdl 's suggestion an incorporate https://github.com/garyhouston/regex as a submodule.

GillesDuvert avatar Jan 19 '23 22:01 GillesDuvert