postscriptbarcode code128: Add minimal encodation algorithm (non-extended ASCII only)

Adapted from ZXing (props Alex Geller) - maybe 80% slower depending on data & stack heavy but does improve some outcomes when FNC1s present (GS1 or manual) although not much else it appears (the previous algorithm was pretty good)

Prompted by tests added from PR #272, props lyngklip

This is the second of the alternative PRs (PR #275). You choose!

Oct 12 '24 22:10 gitlost

I'm wondering about the first Code 128 test case. I suspect a decoder might add 128 to those '9' digits because extended ASCII is active?

Edit: it seems like a grey area. I have no idea what common practice is. This might be good.

Oct 13 '24 16:10 lyngklip

Yes you might think that but extended mode only applies to "the ISO/IEC 646 value", i.e. to Code Sets A and B ASCII values, not to Code Set C double digit values, which aren't ASCII, so C mode stuff can be freely intermixed with extended mode shifts and latches.

Oct 13 '24 17:10 gitlost

ChatGPT seems to agree with you once I helped it understand the question. That makes encoding a bit more complex, right - interesting. I was under the impression that FNC4 insertion was a "preprocessing" step. This would mean that sequences of extended ASCII that mapped to ASCII digits might be encoded in character set C, and that does seem a bit odd even though it would not a problem so long as encoder and decoder agrees. But that's what the encoder did before, right?

Oct 13 '24 17:10 lyngklip

Well that was a bug, which PR #275 fixes. Extended ASCII should never be encoded as Code Set C digits.

Oct 13 '24 17:10 gitlost

No time to weigh in right now, but I'll likely take this PR (over the other) once I've had time to review.

@gitlost Did the basis for the algorithm get written up anywhere? If we're significantly deviate from informative routine provided in the symbology specs then we should have some reference to signpost users to. (I've had to expend a lot of effort over the years convincing developers of pathological decoders that they need to fix decoding bugs, even if the codeword sequence is not the result of a reference encoder.)

Oct 13 '24 18:10 terryburton

The only write up really is that it's a standard algorithm, e.g. https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm.

I'm concerned about performance, both speed and stack usage, so I'd hesitate to use it without trying it out first in some real-life cases if that's possible.

The performance checking I did I wouldn't be confident in, being just loops using usertime for timings with garbage collection turned off (-2 vmreclaim).

Oct 13 '24 19:10 gitlost

Here's the very simplistic performance test I used (mode128 is the Divide-and-Conquer one, code128 is the current):

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /mode128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(mode128 tot ) print tot ==

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /code128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(code128 tot ) print tot ==

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /mode128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(mode128 tot ) print tot ==

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /code128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(code128 tot ) print tot ==

Oct 13 '24 19:10 gitlost

Something that I've been thinking about: should it be made explicit what the encoder does when faced with the possibility of encoding part of the message in two different ways with the same length:

prefer ASCII > Extended or vice versa
prefer A>B>C or B>A>C or C>A>B etc.
prefer range shift to range switch or...
prefer character set shift to character set switch or...
prefer dangling digit at the front of an odd digit span or at the end
switch from ASCII A to Extended B using 100 100 100 or 101 101 100 etc.

There are possibly more alternatives than the ones I have listed. The reason I ask is that I've been playing with a somewhat rewritten encoder and I have come up with something where I can sort of control the priority of these things, but I still can't match all the test cases. The test cases in some places seem to prefer range switching over shifting and in other places the other way around. Unfortunately I have no insight into specifications.

Oct 20 '24 15:10 lyngklip

Unfortunately I have no insight into specifications.

In theory, the examples from the initial ISO/IEC 15417:2000 specification were based on this code

However I have not verified it.

Oct 20 '24 15:10 terryburton

Closing in favour of PR #278

Oct 28 '24 16:10 gitlost