gnucobol icon indicating copy to clipboard operation
gnucobol copied to clipboard

Fix bug #948: make HIGH/LOW-VALUE sensitive to program collating sequence

Open ddeclerck opened this issue 1 year ago • 30 comments
trafficstars

This attempts to fix https://sourceforge.net/p/gnucobol/bugs/948/.

I the loading of collating tables has been moved from codegen to typeck. On loading the program collating sequence, we compute the low and high values and store them in global variables. Codegen now use these variables instead of hard-coded 0/255/0xff. I've also checked the rest of the code to ensure I wasn't breaking anything (I paid attention to the distinction between cb_low/cb_high and cb_norm_low/cb_norm_high).

I also replaced the hard-coded cob_refer_ascii and cob_refer_ebcdic tables by the loaded tables (since their content matches the one defined in default.ttbl).

Added a few tests, in particular the one that was failing for our client.

ddeclerck avatar Feb 26 '24 17:02 ddeclerck

Hi @GitMensch, Would you have a bit of time to review this PR ? This is a blocking issue for our client. Thanks

ddeclerck avatar Mar 11 '24 20:03 ddeclerck

Doing so now (I had quite a lot on my desk [don't ask how it currently looks like ;-) - but as my evening is free I'm inspecting this now, then go on to the other non-reviewed ones.

GitMensch avatar Mar 11 '24 20:03 GitMensch

I had quite a lot on my desk [don't ask how it currently looks like ;-)

Actually curious: how does it look like ? :D

ddeclerck avatar Mar 11 '24 20:03 ddeclerck

Inspecting strings.c str_cob_low -> that always uses 0x00; if we put the low- and high-value into the cob_program, then we could make str_cob_low.data point to that.

Thoughts?

GitMensch avatar Mar 11 '24 21:03 GitMensch

Inspecting strings.c str_cob_low -> that always uses 0x00; if we put the low- and high-value into the cob_program, then we could make str_cob_low.data point to that.

Thoughts?

Yeah, that's something I should do.

ddeclerck avatar Mar 12 '24 15:03 ddeclerck

Inspecting strings.c str_cob_low -> that always uses 0x00; if we put the low- and high-value into the cob_program, then we could make str_cob_low.data point to that. Thoughts?

Yeah, that's something I should do.

Actually not sure how to do that.

I've moved low_value and high_value from typeck.c to the cb_program strucure in tree.h. codegen.c uses that information to output the cob_all_low and cob_all_high fields to the local include files (before that PR, that was in the global include file).

Now I don't know what to do with str_cob_low. Right now it is a static variable. I guess it would require making it non-static, and make it depend on the current program...

Or should I just put the cob_all_low and cob_all_high fields in cob_module instead, so I can use them through COB_MODULE_PTR in strings.c ?

ddeclerck avatar Mar 12 '24 16:03 ddeclerck

Or should I just put the cob_all_low and cob_all_high fields in cob_module instead, so I can use them through COB_MODULE_PTR in strings.c ?

You could do that, using the old value if those aren't set. Note that you don't need the the fields referenced there, it would be enough to have a pointer to a string there containing its .data (which would be a string constant in the generated code. If the pointers are NULL you'd use the old values (always "\x00" / "\xFF", I guess).

Now I don't know what to do with str_cob_low. Right now it is a static variable. I guess it would require making it non-static, and make it depend on the current program...

For making it work it would be enough to set its .data according to the current program. For making it good, you'd place that locally in one of the callers, (where you then also setting the .data) and pass the reference around as/if needed.

GitMensch avatar Mar 12 '24 17:03 GitMensch

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 91.22807% with 5 lines in your changes missing coverage. Please review.

Project coverage is 67.60%. Comparing base (16e84a5) to head (768ad62). Report is 35 commits behind head on gcos4gnucobol-3.x.

Files with missing lines Patch % Lines
cobc/codegen.c 89.47% 2 Missing :warning:
libcob/strings.c 50.00% 1 Missing and 1 partial :warning:
cobc/typeck.c 96.42% 0 Missing and 1 partial :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@                  Coverage Diff                  @@
##           gcos4gnucobol-3.x     #136      +/-   ##
=====================================================
- Coverage              67.85%   67.60%   -0.26%     
=====================================================
  Files                     33       33              
  Lines                  60458    60795     +337     
  Branches               15821    15882      +61     
=====================================================
+ Hits                   41026    41102      +76     
- Misses                 13567    13826     +259     
- Partials                5865     5867       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Mar 13 '24 09:03 codecov-commenter

I think that's it ?

ddeclerck avatar Mar 13 '24 09:03 ddeclerck

Just from reading the Changelog: we now have a low-value field in the cob_module and cb_program; shouldn't we have the collating sequence in there already (or if not: add it)? If so, couldn't we just use its first (or last) position?

GitMensch avatar Mar 13 '24 09:03 GitMensch

Just from reading the Changelog: we now have a low-value field in the module; shouldn't we have the collating sequence in there already (or in general)?If so, couldn't we just use its first position?

True indeed, I oversighted this. We could directly use module->collating_sequence[0] - when collating_sequence is not NULL. I'll make the change.

ddeclerck avatar Mar 13 '24 10:03 ddeclerck

Thanks - before doing that, what about the compiler (see the edited comment above)?

GitMensch avatar Mar 13 '24 10:03 GitMensch

Just from reading the Changelog: we now have a low-value field in the cob_module and cb_program; shouldn't we have the collating sequence in there already (or if not: add it)? If so, couldn't we just use its first (or last) position?

Yes, cob_module already contains the collating sequence (as an array of characters). I made the change to use that in strings.c.

As for cb_program, the collating sequence field is a cb_tree, and it is slightly more involved to determine the low and high values (see cb_validate_collating in typeck.c), so I believe it is a good idea to cache the result of this computation in the cb_program fields low_value and high_value. What do you think ?

ddeclerck avatar Mar 13 '24 10:03 ddeclerck

Just a note: using MF VC the tests all fail, no matter what you define or what you have in CHARSET directive or in the MFCODESET variable at runtime: low-value stays 1, high-value stays 256.

But it does always return the expected X < HIGH-VALUE.

GitMensch avatar Mar 13 '24 10:03 GitMensch

What do you think ?

I think you're right, having it one-time resolved (after the OBJECT-COMPUTER paragraph was parsed) and stored to cb_program makes sense.

GitMensch avatar Mar 13 '24 10:03 GitMensch

Just a note: using MF VC the tests all fail, no matter what you define or what you have in CHARSET directive or in the MFCODESET variable at runtime: low-value stays 1, high-value stays 256.

Ah, this is interesting. So MF always use 1 and 256 for low and high value ? Should we add a dialect option to reproduce this behavior in GnuCOBOL ?

ddeclerck avatar Mar 13 '24 10:03 ddeclerck

Something ... is going on there, I need to do more checks on MF (my latest test were confusing) but I need to do something else before going on with that.

GitMensch avatar Mar 13 '24 10:03 GitMensch

Something ... is going on there, I need to do more checks on MF (my latest test were confusing) but I need to do something else before going on with that.

Hi @GitMensch, Do you have any news about this matter ?

ddeclerck avatar Mar 15 '24 16:03 ddeclerck

Something ... is going on there, I need to do more checks on MF (my latest test were confusing) but I need to do something else before going on with that.

Hi @GitMensch , Did you have a chance to investigate this further ?

ddeclerck avatar Apr 04 '24 18:04 ddeclerck

Got my hands on a MF demo. Interrestingly, using ORD on LOW-VALUE and HIGH-VALUE always give the same result (1 and 256), no matter the encoding. However, exploring the internal representation of fields (using REDEFINES with USAGE BINARY-CHAR) after moving LOW-VALUE and HIGH-VALUE do show the expected values. Maybe it's just ORD behaving differently on MF ?

ddeclerck avatar May 14 '24 13:05 ddeclerck

Hi @GitMensch , Is there anything we can do about that ? (we keep being asked about this patch)

ddeclerck avatar Jul 23 '24 14:07 ddeclerck

Hi @GitMensch, This PR requires some love ❤️ It's getting harder to rebase each time, as my memory of these matters fades away 😅

ddeclerck avatar Aug 20 '24 14:08 ddeclerck

Note that from IBM docs: HIGH-VALUE is in EBCDIC always x'FF', in NATIONAL always nx'FFFF'

I'd really like to interpret that as "HIGH-VALUE in native encoding is always x'FF'. Which makes it EBCDIC on mainframes, but ASCII on PCs (more or less).

ddeclerck avatar Nov 03 '24 20:11 ddeclerck

Rebased - SVN r5406.

ddeclerck avatar Dec 20 '24 09:12 ddeclerck

Rebased - SVN r5408.

ddeclerck avatar Dec 30 '24 10:12 ddeclerck

Rebased - SVN r5427.

ddeclerck avatar Jan 27 '25 09:01 ddeclerck

I think I tackled all the mentionned issues ?

ddeclerck avatar Mar 06 '25 16:03 ddeclerck

Hi @GitMensch , Do you think this one can be merged ? I think I handled all the cganges requested.

ddeclerck avatar Apr 15 '25 09:04 ddeclerck

Let's go with it as it is - and if someone comes back to us after RC1 mind an additional change.

GitMensch avatar Apr 15 '25 15:04 GitMensch

Merged in SVN @ 5490.

ddeclerck avatar Apr 16 '25 08:04 ddeclerck