ble.sh icon indicating copy to clipboard operation
ble.sh copied to clipboard

[macOS 12.3 ARM64] /usr/bin/awk: towc: multibyte conversion failure on: '��@'

Open killermoehre opened this issue 2 years ago • 12 comments

GNU bash, version 5.1.16(1)-release (aarch64-apple-darwin21.1.0) ble.sh, version 0.4.0-devel3+d340233 (noarch) bash-preexec (iterm2_shell_integration.sh), iterm2_shell_integration.sh (noarch) (integration: on) locale: LANG=de_DE.UTF-8 LC_COLLATE=C LC_TERMINAL=iTerm2 LC_TERMINAL_VERSION=3.4.15 terminal: TERM=xterm-256color wcwidth=12.1-west/13.1-2+ri, xterm:95 (0;95;0)

So, I changed to a new MacBook with the AppleARM chip. If I do a on a path (or blesh tries by itself), the following error fragment are thrown.

$ /bin/ls /usr/bin/awk: towc: multibyte conversion failure on: '��@'

 input record number 9, file
 source line number 18
/usr/bin/awk: towc: multibyte conversion failure on: '��@'

 input record number 9, file
 source line number 18

Here is the "output" of /usr/bin/awk --version:

$ /usr/bin/awk --ve/usr/bin/awk: towc: multibyte conversion failure on: '��'

 input record number 31, file
 source line number 18
--versi/usr/bin/awk: towc: multibyte conversion failure on: '��'

 input record number 31, file
 source line number 18
--version/usr/bin/awk: towc: multibyte conversion failure on: '��'

 input record number 31, file
 source line number 18

awk version 20200816

killermoehre avatar May 01 '22 22:05 killermoehre

OK, was a problem on my side. My $PATH was not adjusted at the moment ble-attach run, so only the macOS inbuilt awk was available.

killermoehre avatar May 01 '22 23:05 killermoehre

The search for the error message finds two similar reports

The version format and another error message "input record number ... source line number ..." imply that this is a variant of nawk. However, the source code that contains the error message "towc: multibyte conversion failure on" does not seem to be found in GitHub or by Google. I guess the awk implementation is closed source.

akinomyoga avatar May 03 '22 09:05 akinomyoga

Thank you for the report. I'm not sure if I can fix or work around it soon, but I would like to keep it open so that other users can find it.

akinomyoga avatar May 03 '22 09:05 akinomyoga

Maybe rename awk to gawk and force darwin users to get a recent GNU awk?

SuperSandro2000 avatar May 03 '22 13:05 SuperSandro2000

@SuperSandro2000 Thank you for your comment! Yeah, I think that is the valid workaround. Currently, ble.sh selects the awk implementation here where the precedence is XPG4 awk (Solaris) > nawk > mawk > gawk > awk. I guess the macOS awk is only used when none of nawk, mawk, and gawk is found.

Nevertheless, if it is possible, I'd like to support the macOS awk.

akinomyoga avatar May 03 '22 13:05 akinomyoga

I can provide you the following data on a recent macOS 12.3.1

$ /usr/bin/awk --version
awk version 20200816
$ what /usr/bin/awk
/usr/bin/awk
	PROGRAM:awk  PROJECT:awk-32
	PROGRAM:awk  PROJECT:awk-32

The binary says that it's com.apple.awk and signed by Apple, Inc. Out of personal experience I know that there is no way to ask an awk implementation itself what operation it supports.

killermoehre avatar May 06 '22 13:05 killermoehre

Thank you for the information! I have searched for the string com.apple.awk and found the page opensource.apple.com/tarballs/awk where the source of macOS awk-27.40.1 can be downloaded. Unfortunately, the source of awk-27.40.1 doesn't seem to contain the error message "towc: multibyte conversion failure on".

akinomyoga avatar May 06 '22 13:05 akinomyoga

I asked in the Apple Developer Forums.

killermoehre avatar May 06 '22 14:05 killermoehre

I found the source code of awk-32 on GitHub. [ The GitHub code search for "towc: multibyte conversion failure on" initially didn't find anything, but after I searched for a shorter string "multibyte conversion failure", it somehow started to return results. Maybe the caching of the GitHub code search is related. ]

The source code contains the exact error message. Awk-32 seems to be the release version just coming after awk-27.40.1. The feature difference is explained here.

I have built it in Linux with the following modification

diff --git a/src/b.c b/src/b.c
index cfffcb9..fa03d40 100644
--- a/src/b.c
+++ b/src/b.c
@@ -31,7 +31,9 @@ THIS SOFTWARE.
 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>
-#include <xlocale.h>
+//#include <xlocale.h>
+#include <locale.h>
+typedef int wctype_t;
 #include <wchar.h>
 #include "awk.h"
 #include "awkgram.tab.h"
@@ -1256,6 +1258,7 @@ static int cclex(void) {
                                len = 1;
                                prestr += 2;

+#if 0
                                __collate_lookup_l(&collate_elem, &len, &prim1, &sec1, LC_GLOBAL_LOCALE);
                                DPRINTF("collate_elem: 0x%x p: %d s:%d\n", collate_elem, prim1, sec1);

@@ -1280,6 +1283,7 @@ static int cclex(void) {
                                                n++;
                                        }
                                }
+#endif
                        }
                } else if (wc == '\0') {
                        FATAL("nonterminated character class %.20s", lastre);

I thought maybe I can reproduce the problem using the obtained awk-32, but ble.sh seems to work with awk-32 without the error message on Linux. I'll later check the code again.

akinomyoga avatar May 07 '22 05:05 akinomyoga

@tessus Sorry, I think it's not your intention, but I used the VM to quickly check the behavior of the current /usr/bin/awk in macOS,

% bash --version
GNU bash, version 5.2.26(1)-release (x86_64-apple-darwin22.6.0)
% what /usr/bin/awk
/usr/bin/awk:
        PROGRAM:awk  PROJECT:awk-35
        PROGRAM:awk  PROJECT:awk-35
% head -c 32 /dev/urandom | awk '{print "hello"}'

In the past, awk-32 provided by macOS seemed to have issued error messages for some data not following the current character coding and failed without doing any actual processing. ble.sh wanted to use awk to process the output of Bash's builtin bind -p, in which some binary data is contained in the key sequence in older versions of Bash. I couldn't test it further for a long time because I didn't have access to macOS.

Now the awk version seems to be awk-35 and it doesn't seem to produce an error for binary data, though I'm not sure if the original problem is really related to the binary data.

akinomyoga avatar Feb 23 '24 06:02 akinomyoga

@akinomyoga don't worry about it. it is a dev test VM and you can use it for whatever tests you need. If it helps I can leave it running for a few weeks. You can even destroy it, and if you need it I can rollback to the point before you logged in for the first time. Just drop me a quick email if you need a rollback.

Btw, when I tried to install ble (via the make command) on the VM, ble complained that it wanted gawk. I did a sudo port install gawk and after that the install went through w/o a hitch. You have root access. You can sudo port uninstall gawk for tests if you want.

tessus avatar Feb 23 '24 08:02 tessus

Thanks, by excluding /opt/local/bin from PATH, I can test the behavior without gawk. The error message doesn't seem to happen (though I added some workarounds by guess back then).

I also complied awk-32 from the source and tried it, but the error message doesn't seem to be reproduced by just head -c 4096 /dev/urandom | ./awk '{print "hello"}', etc. The error message might be specific to M1.

akinomyoga avatar Feb 23 '24 16:02 akinomyoga