otp icon indicating copy to clipboard operation
otp copied to clipboard

Initial support for ARM64 Windows

Open cocoa-xu opened this issue 1 year ago • 21 comments

Hi, this PR includes initial patches and instructions to bring Erlang/OTP to ARM64 Windows. These changes should only affect the behaviour when running in WSL and user executes one of the following command to set up the build environment

  • ./otp_build env_win32 arm64, set up build environment for compiling ARM64 Erlang/OTP natively,
  • ./otp_build env_win32 x64_arm64, set up build environment for cross-compiling ARM64 Erlang/OTP on an x86_64 machine

Although I understand that ARM64 Windows might not be anywhere close to the main tier of support yet, this PR can serve as an initial guide for people in the community who're also interested in running Erlang/OTP natively on ARM64 Windows.

cocoa-xu avatar Feb 17 '24 16:02 cocoa-xu

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Feb 17 '24 16:02 CLAassistant

CT Test Results

   10 files    248 suites   3h 28m 48s :stopwatch: 3 029 tests 2 890 :white_check_mark: 137 :zzz: 2 :x: 4 468 runs  4 082 :white_check_mark: 384 :zzz: 2 :x:

For more details on these failures, see this check.

Results for commit 88a1244a.

:recycle: This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

github-actions[bot] avatar Feb 17 '24 16:02 github-actions[bot]

Hi, I have also managed to craft a GitHub workflow that can build it natively on an ARM64 Windows machine, and here is the workflow log, the corresponding CI yaml file and the final installer! This build was based on 6c6b409.

There're still some issues when cross-compiling on x86_64 machine, which I'll look into later.

Other than that, all libraries except wx and odbc build without any issue, ARM64 JIT also seem to work. Although I haven't got time to do any thorough tests yet.

And one more thing is to adjust nsis scripts where relevant, such as defaulting to Program Files (Arm) and adding the vcredist_arm64.exe file.

cocoa-xu avatar Feb 19 '24 01:02 cocoa-xu

Some updates

wxWidget also seems to be okay now.

ARM64-OTP-4 ARM64-OTP-5

As for the nsis script, I'm not sure what's the ARM64's equivalent of $PROGRAMFILES64 so that I can get it to install Erlang/OTP in C:\Program Files (Arm) by default.

cocoa-xu avatar Feb 19 '24 21:02 cocoa-xu

so that I can get it to install Erlang/OTP in C:\Program Files (Arm) by default.

Program Files (Arm) is for Arm32 applications; 64-bit applications should always be installed in Program Files.

ref: https://learn.microsoft.com/en-us/windows/arm/faq#what-program-files-folder-should-i-use-to-install-my-arm64-application-

reflectronic avatar Feb 19 '24 21:02 reflectronic

so that I can get it to install Erlang/OTP in C:\Program Files (Arm) by default.

Program Files (Arm) is for Arm32 applications; 64-bit applications should always be installed in Program Files.

ref: https://learn.microsoft.com/en-us/windows/arm/faq#what-program-files-folder-should-i-use-to-install-my-arm64-application-

Oh I didn't know that before. Thanks for the information! Gonna update the nsis script file!

cocoa-xu avatar Feb 19 '24 21:02 cocoa-xu

And here we have the test results. I'm not sure which of these files should I archive and upload here (and also because the test result directory is about ~1.5 GB, and 259MB after compressing), so I only did a screenshot for now. Please let me know if you need me to upload the test results.

tests results below is outdated

![IMG_6891CE8947DB-1](https://github.com/erlang/otp/assets/89497197/0859f782-a722-4059-b321-fc9fe964c5f4)

cocoa-xu avatar Feb 20 '24 09:02 cocoa-xu

It seems that the emulator and kernel tests crashed hard, can you see which tests it crashed on?

jhogberg avatar Feb 20 '24 09:02 jhogberg

It seems that the emulator and kernel tests crashed hard, can you see which tests it crashed on?

Hi, I decided to upload the test result here, https://github.com/cocoa-xu/otp-build/releases/download/v27.0-rc1/test_server-otp_win_arm64_27.0-rc1.tar.xz, and using xz reduces its size to 75MB. It will be easier to check all relevant logs, as well as for other people to look and maybe give any feedbacks.

It corresponds to this installer, https://github.com/cocoa-xu/otp-build/releases/download/v27.0-rc1/otp_win_arm64_27.0-rc1.exe.

Also please let me know if any tests need a re-run, and I'll update the test results. :)

cocoa-xu avatar Feb 20 '24 09:02 cocoa-xu

It seems that the emulator and kernel tests crashed hard, can you see which tests it crashed on?

As for the ones that cause crashes, they're bs_construct_SUITE:error_info/1 and logger_disk_log_h_SUITE:disk_log_full/1

bs_construct_SUITE:error_info/1

=== Test case: [bs_construct_SUITE:error_info/1](file:///Users/cocoa/Downloads/test_server/[email protected]_06.40.17/tests.emulator_test.logs/run.2024-02-20_06.40.46/bs_construct_suite.src.html#error_info-1) (click for source code)

=== Config value:

    [{watchdog,<0.115844.16>},
     {tc_logfile,"c:/src/release/tests/test_server/[email protected]_06.40.17/tests.emulator_test.logs/run.2024-02-20_06.40.46/bs_construct_suite.error_info.html"},
     {tc_group_properties,[]},
     {tc_group_path,[]},
     {data_dir,"c:/src/release/tests/emulator_test/bs_construct_SUITE_data/"},
     {priv_dir,"c:/src/release/tests/test_server/[email protected]_06.40.17/tests.emulator_test.logs/run.2024-02-20_06.40.46/log_private/"},
     {nodenames,[]}]

=== Current directory is "c:/src/release/tests/test_server/[email protected]_06.40.17"

=== Started at 2024-02-20 06:45:21



<< Atom / binary , Binary / binary >>: segment 1 of type 'binary': expected a binary but got: some_atom

<< Binary / binary , Atom / binary >>: segment 2 of type 'binary': expected a binary but got: some_atom

<< 1 : 32 , Binary / binary , Atom / binary >>: segment 3 of type 'binary': expected a binary but got: some_atom

<< 1 : 32 , "xyz" , Binary / binary , Atom / binary >>: segment 4 of type 'binary': expected a binary but got: some_atom

<< Atom : 32 >>: segment 1 of type 'integer': expected an integer but got: some_atom

<< Atom : ( id ( 32 ) ) >>: segment 1 of type 'integer': expected an integer but got: some_atom

<< LongList : 32 >>: segment 1 of type 'integer': expected an integer but got: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19|...]

<< 42 : Atom >>: segment 1 of type 'integer': expected a non-negative integer as size but got: some_atom

<< Atom : 32 >>: segment 1 of type 'integer': expected an integer but got: some_atom

<< 42 : NegSize >>: segment 1 of type 'integer': expected a non-negative integer as size but got: -1

<< 42 : HugeNegSize >>: segment 1 of type 'integer': expected a non-negative integer as size but got: -18446744073709551616

<< 42 : ( 1 bsl 58 ) / unit : 255 >>: segment 1 of type 'integer': the size 288230376151711744 is too large

<< 42 : ( 1 bsl 60 ) / unit : 8 >>: segment 1 of type 'integer': the size 1152921504606846976 is too large

<< 42 : ( 1 bsl 64 ) >>: segment 1 of type 'integer': the size 18446744073709551616 is too large

<< Atom : 10 / binary >>: segment 1 of type 'binary': expected a binary but got: some_atom

<< Atom : ( id ( 10 ) ) / binary >>: segment 1 of type 'binary': expected a binary but got: some_atom

<< Binary : Atom / binary >>: segment 1 of type 'binary': expected a non-negative integer as size but got: some_atom

<< Binary : NegSize / binary >>: segment 1 of type 'binary': expected a non-negative integer as size but got: -1

<< Binary : HugeNegSize / binary >>: segment 1 of type 'binary': expected a non-negative integer as size but got: -18446744073709551616

<< Binary : BadSize / binary >>: segment 1 of type 'binary': expected a non-negative integer as size but got: bad_size

<< BadBinary : BadSize / binary >>: segment 1 of type 'binary': expected a non-negative integer as size but got: bad_size

<< Binary : 10 / binary >>: segment 1 of type 'binary': the value <<"abc">> is shorter than the size of the segment

<< Binary : ( id ( 10 ) ) / binary >>: segment 1 of type 'binary': the value <<"abc">> is shorter than the size of the segment

<< Atom / binary >>: segment 1 of type 'binary': expected a binary but got: some_atom

<< ( id ( << 1 : 1 >> ) ) / binary >>: segment 1 of type 'binary': the size of the value <<1:1>> is not a multiple of the unit for the segment

<< ( id ( << 0 : 1111 >> ) ) / binary >>: segment 1 of type 'binary': the size of the value <<0,0,0,0,0,0,0,0...,0,0:7>> is not a multiple of the unit for the segment

<< 0 , ( id ( << 1 : 1 >> ) ) / binary >>: segment 2 of type 'binary': the size of the value <<1:1>> is not a multiple of the unit for the segment

<< 0 , ( id ( << 0 : 1111 >> ) ) / binary >>: segment 2 of type 'binary': the size of the value <<0,0,0,0,0,0,0,0...,0,0:7>> is not a multiple of the unit for the segment

<< Binary : ( 1 bsl 64 ) / binary >>: segment 1 of type 'binary': the size 18446744073709551616 is too large

<< Binary : ( id ( 1 bsl 64 ) ) / binary >>: segment 1 of type 'binary': the size 18446744073709551616 is too large

<< Atom : 64 / float >>: segment 1 of type 'float': expected a float or an integer but got: some_atom

<< Atom : Atom / float >>: segment 1 of type 'float': expected a non-negative integer as size but got: some_atom

<< 42.0 : NegSize / float >>: segment 1 of type 'float': expected a non-negative integer as size but got: -1

<< 42.0 : HugeNegSize / float >>: segment 1 of type 'float': expected a non-negative integer as size but got: -18446744073709551616

<< 42.0 : ( id ( 1 ) ) / float >>: segment 1 of type 'float': expected one of the supported sizes 16, 32, or 64 but got: 1

<< HugeBig : ( id ( 64 ) ) / float >>: segment 1 of type 'float': the value 350746621104...713208549376 is outside the range expressible as a float

<< HugeBig : 64 / float >>: segment 1 of type 'float': the value 350746621104...713208549376 is outside the range expressible as a float

<< 42.0 : ( id ( 1 bsl 64 ) ) / float >>: segment 1 of type 'float': the size 18446744073709551616 is too large

<< Atom / utf8 >>: segment 1 of type 'utf8': expected a non-negative integer encodable as utf8 but got: some_atom

<< Atom / utf16 >>: segment 1 of type 'utf16': expected a non-negative integer encodable as utf16 but got: some_atom

<< Atom / utf32 >>: segment 1 of type 'utf32': expected a non-negative integer encodable as utf32 but got: some_atom

<< 0 : ( MaxSmall ) / unit : 32 , 0 : 64 >>: segment 1 of type 'integer': the size 576460752303423487 is too large

<< 0 : ( MaxSmall ) / unit : 32 , ( id ( 0 ) ) : 64 >>: segment 1 of type 'integer': the size 576460752303423487 is too large

logger_disk_log_h_SUITE:disk_log_full/1

*** System report during logger_disk_log_h_SUITE:disk_log_full/1 in tty 2024-02-20 07:35:45.575 ***[🔗](file:///Users/cocoa/Downloads/test_server/[email protected]_06.45.23/tests.kernel_test.logs/run.2024-02-20_06.45.55/logger_disk_log_h_suite.disk_log_full.html#e-1363)
=ERROR REPORT==== 20-Feb-2024::07:35:45.575000 ===
** Generic server <0.106479.0> terminating 
** Last message in was {disk_log,'test_server@COCOA-ARM64-PC',
                                 logger_disk_log_h_SUITE,full}
** When Server state == #{id => logger_disk_log_h_logger_disk_log_h_SUITE,
                          module => logger_h_common,mode => async,
                          idle => false,burst_limit_enable => true,
                          drop_mode_qlen => 200,flush_qlen => 1000,
                          sync_mode_qlen => 10,burst_msg_count => 9,
                          burst_win_ts => -24019194990094072,
                          cb_state =>
                              #{id => logger_disk_log_h_SUITE,
                                module => logger_disk_log_h,
                                filesync_repeat_interval => 5000,
                                ctrl_sync_count => 11,
                                handler_state =>
                                    #{log_opts =>
                                          #{type => halt,
                                            file =>
                                                "c:/src/release/tests/test_server/[email protected]_06.45.23/tests.kernel_test.logs/run.2024-02-20_06.45.55/log_private/disk_log_full",
                                            max_no_bytes => 50,
                                            max_no_files => 1},
                                      prev_log_result => ok,
                                      prev_sync_result => ok,
                                      prev_disk_log_info => undefined},
                                last_op => write,
                                rep_sync_tref =>
                                    #Ref<0.3961621595.2628255746.191094>},
                          last_load_ts => -24019194990093602,last_qlen => 0,
                          mode_ref =>
                              {logger_olp,
                                  logger_disk_log_h_logger_disk_log_h_SUITE},
                          overload_kill_restart_after => 5000,
                          burst_limit_max_count => 500,
                          burst_limit_window_time => 1000,
                          overload_kill_enable => false,
                          overload_kill_mem_size => 3000000,
                          overload_kill_qlen => 20000}
** Reason for termination ==
** {badarg,[{erlang,display_string,
                    ["Logger - error: "],
                    [{file,"erlang.erl"},
                     {line,2139},
                     {error_info,#{cause => 'The handle is invalid.\r\n',
                                   module => erl_erts_errors}}]},
            {logger,internal_log,2,[{file,"logger.erl"},{line,726}]},
            {logger_disk_log_h,error_notify_new,4,
                               [{file,"logger_disk_log_h.erl"},{line,365}]},
            {logger_h_common,handle_info,2,
                             [{file,"logger_h_common.erl"},{line,310}]},
            {logger_olp,try_callback_call,4,
                        [{file,"logger_olp.erl"},{line,617}]},
            {logger_olp,handle_info,2,[{file,"logger_olp.erl"},{line,271}]},
            {gen_server,try_handle_info,3,
                        [{file,"gen_server.erl"},{line,2158}]},
            {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,2246}]}]}

cocoa-xu avatar Feb 20 '24 11:02 cocoa-xu

It seems that the emulator and kernel tests crashed hard, can you see which tests it crashed on?

new tests results are available in the reply below

outdated test results Updates for the `emulator` tests. I looked and removed the lines that cause hard-crashes, and now the test results is
2316 Ok, 12 Failed, 208 Skipped of 2536

hard-crashes

common

In

  • bs_construct_SUITE:error_info
  • bs_construct_no_opt_SUITE:error_info
  • bs_construct_r25_SUITE:error_info
  • bs_construct_stripped_types_SUITE:error_info
{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(id(MaxSmall))/unit:32,0:64>>),
{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(id(MaxSmall))/unit:32,(id(0)):64>>),

bs_construct_no_opt_SUITE:error_info

{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(MaxSmall)/unit:32,0:64>>),
{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(MaxSmall)/unit:32,(id(0)):64>>),

failed

  • bif_SUITE:display_string
  • bs_construct_SUITE:dynamic
  • bs_construct_SUITE:error_info
  • bs_construct_no_opt_SUITE:error_info
  • bs_construct_r25_SUITE:dynamic
  • bs_construct_r25_SUITE:error_info
  • bs_construct_stripped_types_SUITE:error_info
  • hello_SUITE:hello
  • port_SUITE: pipe_limit_env
  • port_SUITE: unregister_name
  • smoke_test_SUITE:jump_table
  • time_SUITE:time_unit_conversion

failed to init

~~mtx_SUITE:init_per_suite~~ ~~nif_SUITE init_per_suite~~

Re-running tests in wsl seems to make them work, will update test results in a few hours.

cocoa-xu avatar Feb 20 '24 14:02 cocoa-xu

Updates for all tests.

I did a clean compile and removed the lines that cause hard-crashes in emulator and kernel.

For the emulator tests, the lines I removed were:

{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(id(MaxSmall))/unit:32,0:64>>),
{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(id(MaxSmall))/unit:32,(id(0)):64>>),

In these test cases:

  • bs_construct_SUITE:error_info
  • bs_construct_no_opt_SUITE:error_info
  • bs_construct_r25_SUITE:error_info
  • bs_construct_stripped_types_SUITE:error_info

In addition to the lines above, I also have to remove these two lines in bs_construct_no_opt_SUITE:error_info

{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(MaxSmall)/unit:32,0:64>>),
{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(MaxSmall)/unit:32,(id(0)):64>>),

For the kernel tests, I need to change these two tests to

disk_log_full(Config) ->
    ok.
% ...
disk_log_events(_Config) ->
    ok.

After the above changes, I ran the tests in WSL (which was done in Powershell before), and the latest test results are:

test_results

Also, sorry that I'm not sure how I should properly remove/truncate successful tests to save space, so I still have to upload the full test results as a GitHub release file here, as the attachment file cannot exceed 25MBs in PR comments.

cocoa-xu avatar Feb 21 '24 07:02 cocoa-xu

And I was wondering if and which ones of these failed test cases should we address in this PR? Perhaps the ones in the emulator, kernel and system?

cocoa-xu avatar Feb 21 '24 07:02 cocoa-xu

And I was wondering if and which ones of these failed test cases should we address in this PR? Perhaps the ones in the emulator, kernel and system?

The hard crashes must be fixed before we can include this in the nightly test runs, and the rest need to be investigated and fixed before we merge it. :-)

jhogberg avatar Feb 21 '24 09:02 jhogberg

I found that a few of these hard-crashed ones were related to erlang:display_string/{1,2} errored with bad argument. However, the call was unsuccessful because the fd passed to WriteFile is an invalid handle according to GetLastError.

#ifdef __WIN32__
    printf("[debug] we're in display_string_2 win32\r\n");
    HANDLE fd;
    if (ERTS_IS_ATOM_STR("stdout", BIF_ARG_1)) {
        fd = GetStdHandle(STD_OUTPUT_HANDLE);
        printf("[debug] and the BIF_ARG_1 is stdout, fd=%d\r\n", fd);
    } else if (ERTS_IS_ATOM_STR("stderr", BIF_ARG_1)) {
        fd = GetStdHandle(STD_ERROR_HANDLE);
        printf("[debug] and the BIF_ARG_1 is stderr, fd=%d\r\n", fd);
        if (fd == INVALID_HANDLE_VALUE) {
            printf("[debug] and fd == INVALID_HANDLE_VALUE\r\n");
        }
    }
#else
// ...
#endif

#ifdef __WIN32__
        Uint32 w;
        if (!WriteFile(fd, str, len, &w, NULL)) {
            DWORD last_error = GetLastError();
            LPTSTR error_text = NULL;
            FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_IGNORE_INSERTS, NULL, HRESULT_FROM_WIN32(last_error), MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), (LPTSTR)&error_text, 0, NULL);
            if (error_text != NULL) {
                printf("[debug] error msg: %s\r\n", error_text);
                LocalFree(error_text);
            } else {
                printf("[debug] error msg: cannot get formatted error message\r\n");
            }
            goto error;
        }
        written = (Sint)w;
#else
// ...
#endif

The coresponding test case was changed to:

display_string(Config) when is_list(Config) ->
    # using `erlang:display` for comparison and check that we can display to console
    true = erlang:display("hej"),
    true = erlang:display_string("hej"),
    ok.

And I got this output when running this test case:

$ /mnt/c/src/release/aarch64-windows-msvc/bin/erl.exe -noinput -eval "ts:run(emulator, bif_SUITE, display_string, [batch]),erlang:halt()"
Command: erl -sname test_server -rsh ssh -env PATH "c:\src\release\tests\emulator_test;c:\src\release\AARCH6~1\ERTS-1~1.2\bin;c:\Program Files (Arm)\MICROS~1\JDK-16~1.7-H\bin;c:\PROGRA~1\PARALL~1\PARALL~1\APPLIC~1;c:\Windows\System32;c:\Windows;c:\Windows\System32\wbem;c:\Windows\System32\WindowsPowerShell\v1.0;c:\Windows\System32\OpenSSH;c:\PROGRA~1\Git\cmd;c:\PROGRA~3\CHOCOL~1\bin;c:\Users\cocoa\AppData\Local\Programs\Python\PYTHON~1\Scripts;c:\Users\cocoa\AppData\Local\Programs\Python\PYTHON~1;c:\Users\cocoa\AppData\Local\MICROS~1\WINDOW~1;c:\Users\cocoa\AppData\Local\Programs\MICROS~1\bin;c:\PERL_A~1\bin;c:\Users\cocoa\AppData\Local\GITHUB~1\bin;c:\ERLANG~2\bin;c:\Users\cocoa\ELIXIR~1.1\bin;c:\PROGRA~1\MICROS~1\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\HOSTAR~1\arm64;c:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\arm64;c:\Program Files (Arm)\MICROS~1\JDK-16~1.7-H\bin;c:\PROGRA~1\PARALL~1\PARALL~1\APPLIC~1;c:\Windows\System32;c:\Windows;c:\Windows\System32\wbem;c:\Windows\System32\WindowsPowerShell\v1.0;c:\Windows\System32\OpenSSH;c:\PROGRA~1\Git\cmd;c:\PROGRA~3\CHOCOL~1\bin;c:\Users\cocoa\AppData\Local\Programs\Python\PYTHON~1\Scripts;c:\Users\cocoa\AppData\Local\Programs\Python\PYTHON~1;c:\Users\cocoa\AppData\Local\MICROS~1\WINDOW~1;c:\Users\cocoa\AppData\Local\Programs\MICROS~1\bin;c:\PERL_A~1\bin;c:\Users\cocoa\AppData\Local\GITHUB~1\bin;c:\ERLANG~2\bin;c:\Users\cocoa\ELIXIR~1.1\bin" -env ERL_CRASH_DUMP c:/src/release/tests/test_server/emulator_erl_crash.dump -boot start_sasl -sasl errlog_type error -pz "c:/src/release/tests/test_server" -enable-feature maybe_expr -ct_test_vars "{net_dir,\"\"}" -eval "ts_run:ct_run_test(\"c:/src/release/tests/emulator_test\", [{abort_if_missing_suites,true},{suite,bif_SUITE},{testcase,display_string},{logdir,\"../test_server\"},{config,[\"../test_server/ts.config\",\"../test_server/ts.win32.config\"]},{vars,[{verbose,0}]},batch,{scale_timetraps,true}])"  -noinput -eval "erlang:halt(0,[{flush,false}])."

Common Test starting (cwd is c:/src/release/tests/emulator_test)


Common Test: Running make in test directories...
Recompile: bif_SUITE
Recompile: gen_tcp_dist
Recompile: ignore_cores

CWD set to: "c:/src/release/tests/test_server/[email protected]_22.31.26"

Testing tests.emulator_test.bif_SUITE.display_string: Starting test, 1 test cases
"hej"
[debug] we're in display_string_2 win32
[debug] and the BIF_ARG_1 is stderr, fd=96
[debug] WriteFile went wrong
[debug] error msg: The handle is invalid.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
erlang:display_string failed on line 2139
Reason: {badarg,[{erlang,...},{...}|...]}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Testing tests.emulator_test.bif_SUITE.display_string: *** FAILED test case 1 of 1 ***
Testing tests.emulator_test.bif_SUITE.display_string: TEST COMPLETE, 0 ok, 1 failed of 1 test cases

Updating c:/src/release/tests/test_server/index.html ... done
Updating c:/src/release/tests/test_server/all_runs.html ... done

And I also attempted to use _fileno and _write instead of GetStdHandle and WriteFile but it ended up pretty the same:

#ifdef __WIN32__
    int fd;
    if (ERTS_IS_ATOM_STR("stdout", BIF_ARG_1)) {
        fd = _fileno(stdout);
    } else if (ERTS_IS_ATOM_STR("stderr", BIF_ARG_1)) {
        fd = _fileno(stderr);
    }
#else
// ...
#endif

#ifdef __WIN32__
        written = 0;
        do {
            res = _write(fd, str+written, len-written);
            if (res < 0 && errno != ERRNO_BLOCK && errno != EINTR) {
                printf("[debug] _write went wrong, errno=%d\r\n", errno);
                goto error;
            }
            written += res;
        } while (written < len);
#else
// ...
#endif

The test outputted:

Testing tests.emulator_test.bif_SUITE.display_string: Starting test, 1 test cases
"hej"
[debug] we're in display_string_2 win32
[debug] and the BIF_ARG_1 is stderr, fd=96
[debug] _write went wrong, errno=9

Yet this function seems to be fine when using erl either in PowerShell or WSL:

$ /mnt/c/src/release/aarch64-windows-msvc/bin/erl.exe
Erlang/OTP 27 [RELEASE CANDIDATE 1] [erts-14.2.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Eshell V14.2.2 (press Ctrl+G to abort, type help(). for help)
1> erlang:display_string("hej").
hejtrue

I wonder if we should skip this test for ARM64 Windows or using some alternatives?

cocoa-xu avatar Feb 21 '24 15:02 cocoa-xu

I wonder if we should skip this test for ARM64 Windows or using some alternatives?

And for this particular function, erlang:system_info(system_architecture) only returns win32 on all windows systems. Should we consider to change it to aarch64-pc-windows by changing these lines of the echo_env_wsl function in otp_build?

    if [ X"$X64" = X"true" ]; then
        echo_setenv OVERRIDE_TARGET win32 ';'
        echo_setenv CONFIG_SUBTYPE win64 ';'
    elif [ X"$ARM64" = X"arm64" -o X"$ARM64" = X"x64_arm64" ]; then
        echo_setenv OVERRIDE_TARGET aarch64-pc-windows ';'
        echo_setenv CONFIG_SUBTYPE "$ARM64" ';'
	else
        echo_setenv OVERRIDE_TARGET win32 ';'
    fi

cocoa-xu avatar Feb 21 '24 15:02 cocoa-xu

Isn't win32 used as the name of the used API and we still use that on arm for windows?

dgud avatar Feb 21 '24 15:02 dgud

Isn't win32 used as the name of the used API and we still use that on arm for windows?

Oh it's probably just a minor thing. I was trying to find a way to detect the CPU architecture using

erlang:system_info(system_architecture).

For other systems, the result can usually be used to check the CPU architecture, for example, x86_64-pc-linux-gnu or aarch64-apple-darwin23.2.0. But for Windows we have to do another check using

os:getenv("PROCESSOR_ARCHITECTURE").

Besides that, it also doesn't follow the official document:

system_architecture
  Returns a string containing the processor and OS architecture the emulator is built for.

Maybe it's too late and would definitely break a lot of things for x86 and x86_64 Windows, but perhaps we can do it for ARM64 Windows. WDYT?

cocoa-xu avatar Feb 22 '24 10:02 cocoa-xu

I still can't figure out why Windows think the file handle we get from GetStdHandle is invalid, but I found a workaround, https://github.com/erlang/otp/pull/8142/commits/2d79abba857f885ffd0096f8f31f3ed2e3023ffb, and it fixed all failed test cases due to erlang:display_string/2, including:

emulator

  • bif_SUITE/display_string
  • hello_SUITE/hello

kernel

  • logger_disk_log_h_SUITE/mem_kill_new
  • logger_disk_log_h_SUITE/restart_after

I'll upload a screenshot for the emulator and kernel tests shortly.

cocoa-xu avatar Feb 23 '24 15:02 cocoa-xu

Some explanation for commit https://github.com/erlang/otp/pull/8142/commits/892019ab1e6fd03381d8f08c2b3642329a7d4130

During my tests, I found that high in erts_time_unit_conversion can overflow and cause the test case time_unit_conversion in time_SUITE to fail. For example, on my ARM64 Windows machine when initialising erts_time_sup__.r.o.start_offset.nsec in erts_init_time_sup,

    erts_time_sup__.r.o.start_offset.nsec = (ErtsMonotonicTime)
	erts_time_unit_conversion((Uint64) abs_native_offset,
				  (Uint32) ERTS_MONOTONIC_TIME_UNIT,
				  (Uint32) 1000*1000*1000);

The value of ERTS_MONOTONIC_TIME_UNIT on my machine is 24000000, and abs_native_offset is 576460752312000000. And high will overflow in erts_time_unit_conversion, as the minimal example shown below:

#include <stdio.h>
#include <stdint.h>

uint64_t
erts_time_unit_conversion(uint64_t value,
	uint32_t from_time_unit,
	uint32_t to_time_unit)
{
	uint64_t high, low, result;
	if (value <= ~((uint64_t) 0)/to_time_unit)
		return (value*to_time_unit)/from_time_unit;

	low = value & ((uint64_t) 0xffffffff);
	high = (value >> 32) & ((uint64_t) 0xffffffff);
	
	low *= to_time_unit;
	high *= to_time_unit;
	
	high += (low >> 32) & ((uint64_t) 0xffffffff);
	low &= ((uint64_t) 0xffffffff);
	
	result = high % from_time_unit;
	high /= from_time_unit;
	printf("[!] high=%llu\n", high);
	printf("[!] high will overflow=%s\n",  (high >> 32) != 0 ? "yes" : "no");
	high <<= 32;
	printf("[!] high=%llu, result=%llu\n", high, result);
	
	result <<= 32;
	result += low;
	result /= from_time_unit;
	result += high;
	
	return result;
}

int main(int argc, char *argv[]) {
	printf("%lld\n\n", erts_time_unit_conversion(576460752312000000, 24000000, 1000*1000*1000));
	printf("%lld\n", erts_time_unit_conversion(576460752312000000, 24000000, 1000*1000));
}

The output of the above minimal code is

[!] high=5592405333
[!] high will overflow=yes
[!] high=5572453937501437952, result=9996874
5572453939290448384

[!] high=5592405
[!] high will overflow=no
[!] high=24019196580986880, result=8001996
24019198013000000

The correct value of the final result should be 24019198012999999488 yet it's too large even for uint64_t to hold it. So practically, there's no easy way to get around with it without changing the type of ErtsMonotonicTime.

And this overflow will cause all subsequent calculations depending on ERTS_MONOTONIC_OFFSET_NSEC to give wrong values. One particular case is that when the user wants to use nano seconds as the time unit in erlang:monotonic_time/1.

As for the fix, I think one possible (and perhaps the easiest and the most striaghtforwad) way is that we can simply do the calculations in erlang code. I tried to detect if there's any overflow in the C code and if so, we handle it in erlang code. But after I've done that, I feel that it just adds some unnecessary complexity to the C code.

I hope all this would make sense, but I'm happy to make any other changes to the C code if there's a better solution for it.

cocoa-xu avatar Feb 24 '24 02:02 cocoa-xu

I was wondering if I should try to use __int128_t for ErtsMonotonicTime (if __int128_t is available) or just leave it for now?

cocoa-xu avatar Feb 25 '24 07:02 cocoa-xu

Hi all! As this is kinda stale, I wonder if it would be easier for us if I break this PR into smaller PRs? I can start with PRs that definitely won't break any existing things.

cocoa-xu avatar May 20 '24 10:05 cocoa-xu

I'm closing this one, since you have started with smaller PR's.

dgud avatar Oct 22 '24 08:10 dgud