rmkaplan comments

Results 360 comments of


                                            rmkaplan

unicode and FILEPOS

I did this last summer, for internal XCCS to file UTF8/Unicode, in that particular sense this can be closed. But this really should be generalized. What is needed is a...

unicode and FILEPOS

One further issue: The case array. The current implementation seems to apply the case array to the raw bytes of the string and the file and then test the results...

unicode and FILEPOS

The problem is whether you are looking up bytes or characters. > On May 8, 2021, at 12:16 PM, Larry Masinter ***@***.***> wrote: > > > the efficient way to...

Right, and that’s what presumably the search string and the file both contain. But the file-characters are coded in different byte representation and with different mappings into the internal character-coding...

unicode and FILEPOS

Yes, this is still screwed up, I was looking at it last night and this morning. (But then I got distracted by another glitch: (OPENSTREAM T ‘OUTPUT) produces a stream...

unicode and FILEPOS

I thought I should say more about the current (= forever) issues with FILEPOS. It is described as behaving like STRPOS, except that it searches files instead of other strings....

unicode and FILEPOS

The current code is incorrect in another way. It returns the wrong byte position if the search pattern begins with a SKIP character. I think it has a needless optimization...

unicode and FILEPOS

As noted in the comments above, FILEPOS is currently only a byte-sequence searcher, highly optimized (and even then providing incorrect results if the search pattern begins with the skip byte)....

unicode and FILEPOS

What is "right" presumably involves the equivalent of a finite-state transducer in the middle of file searching, with importing or creating a whole bunch of code to interpret Unicode features...

unicode and FILEPOS

On further thought, though, it seems that the correct search of any NS-format file will always be the (relatively) slower (character, not byte, matching) case. Unlike UTF-8, ISO8859 etc., the...