otp icon indicating copy to clipboard operation
otp copied to clipboard

erl.exe -noshell -eval erlang:get_cookie() -s init stop -name undefined failed with Can't set long node name!

Open inikulshin opened this issue 10 months ago • 6 comments

Describe the bug erl.exe -noshell -eval erlang:get_cookie() -s init stop -name undefined failed with

  Can't set long node name!
  Please check your configuration
  
  2025-04-14 15:58:55.230000 crash_report        
      initial_call: {net_kernel,init,['Argument__1']}
      pid: <0.64.0>
      registered_name: []
      error_info: {exit,{error,badarg},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,961}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}
      ancestors: [net_sup,kernel_sup,<0.47.0>]
      message_queue_len: 0
      messages: []
      links: [<0.61.0>]
      dictionary: [{longnames,true}]
      trap_exit: true
      status: running
      heap_size: 6772
      stack_size: 28
      reductions: 2516
  2025-04-14 15:58:55.230000 supervisor_report   
      supervisor: {local,net_sup}
      errorContext: start_error
      reason: {'EXIT',nodistribution}
      offender: [{pid,undefined},{id,net_kernel},{mfargs,{net_kernel,start_link,[#{name=>undefined,supervisor=>net_sup,name_domain=>longnames,clean_halt=>true}]}},{restart_type,permanent},{significant,false},{shutdown,2000},{child_type,worker}]
  2025-04-14 15:58:55.230000 supervisor_report   
      supervisor: {local,kernel_sup}
      errorContext: start_error
      reason: {shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}
      offender: [{pid,undefined},{id,net_sup},{mfargs,{erl_distribution,start_link,[]}},{restart_type,permanent},{significant,false},{shutdown,infinity},{child_type,supervisor}]
  2025-04-14 15:58:55.230000 crash_report        
      initial_call: {application_master,init,['Argument__1','Argument__2','Argument__3','Argument__4']}
      pid: <0.46.0>
      registered_name: []
      error_info: {exit,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{kernel,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,142}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}
      ancestors: [<0.45.0>]
      message_queue_len: 1
      messages: [{'EXIT',<0.47.0>,normal}]
      links: [<0.45.0>,<0.44.0>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 376
      stack_size: 28
      reductions: 169
  2025-04-14 15:58:55.230000 std_info            
      application: kernel
      exited: {{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{kernel,start,[normal,[]]}}
      type: permanent
  Kernel pid terminated (application_controller) ("{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{kernel,start,[normal,[]]}}}")

To Reproduce I can't reproduce it, same command worked thousands times before. But maybe as a code authors you can think of reproduction scenario/environment.

Expected behavior Must succeed, according to https://www.erlang.org/docs/26/man/erl#flags

If Name is set to undefined the node will be started in a special mode optimized to be the temporary client of another node.

Affected versions 26.2.5 on Windows

inikulshin avatar Apr 16 '25 13:04 inikulshin

The system cannot figure out your longname automatically, so you need to set it.

erl.exe -noshell -eval erlang:get_cookie() -s init stop -name [email protected]

alternatively you need to configure your system so that the long domain can be detected.

garazdawi avatar Apr 16 '25 13:04 garazdawi

@garazdawi

Afaiu, I can just use -sname undefined, at least for my purpose (to create .erlang.cookie file).

But firstly, -name with special value undefined worked for months if not for years, and is still working for most of the times. So, at least sometimes erl.exe knows to work with special value -name undefined without hostname.

And secondly, imo documentation should describe it better.

inikulshin avatar Apr 16 '25 14:04 inikulshin

Afaiu, I can just use -sname undefined, at least for my purpose (to create .erlang.cookie file).

yes, short name normally always works as it does not need a working DNS server.

But firstly, -name with special value undefined worked for months if not for years, and is still working for most of the times. So, at least sometimes erl.exe knows to work with special value -name undefined without hostname.

Something changed with the configuration of your network/machine or with the way that Erlang detects this. Do you have the same issue if you use older Erlang releases?

And secondly, imo documentation should describe it better.

Agreed, the documentation can be improved here. I don't know much about how you configure long vs short names on Windows, so maybe you can help out?

garazdawi avatar Apr 16 '25 15:04 garazdawi

Something changed with the configuration of your network/machine or with the way that Erlang detects this.

For sure something in our environment was changed, but I don't know what. We use Erlang only for RabbitMQ and we never specially configured it.

Do you have the same issue if you use older Erlang releases?

We use -name undefined for few months and with 26.2.5 only, but we saw this crash for the first time only few days ago.

Agreed, the documentation can be improved here.

Imo, the code should be fixed, and not the documentation.

For -sname option, undefined is now well-known, reserved value and it should be the same for -name option. I don't know the implementation, but if for Dynamic Node Name hostname part is ignored after all, using undefined@localhost or [email protected] is very confusing even if documented.

Checking for undefined (for Dynamic Node Name) should be done before any other [s]name parsing and matching.

short name normally always works as it does not need a working DNS server.

-name undefined should also normally always work and should not need a working DNS server.

inikulshin avatar Apr 17 '25 06:04 inikulshin

For what is worth, we heard similar reports in Livebook, which also used -name. For example, if you join another network or another computer joined your network with the same name as yours, then your operating system may change the name accordingly, and we also saw computer names being set to invalid host names automatically, such as empty strings or number sequences, which we could not reproduce by calling anything via the command line. Next time it happens, try grabbing the machine’s name and see if it is valid, but it all pointed out to be an operating system or network issue.

josevalim avatar Apr 19 '25 21:04 josevalim

We will try to get to the bottom of this, maybe it is an oversight of some kind when handling dynamic nodes. This might require some cross team effort, so alas I think this will happen first after summer vacation period.

IngelaAndin avatar Jun 24 '25 13:06 IngelaAndin

I have had a look at the problem and have concluded that:

  • A node must present its host name to the node it connects to.
  • A node that uses long names must only accept connections from nodes that use long names, and the same for short names.
  • The only way a node can know if a connecting node uses long names or not is by looking at the presented host name to see if it contains a dot.
  • Long names; -name Name starts a node that uses a fully qualified node name, that is; Name@FullyQualifiedHostName, so it must figure out its own fully qualified host name.
  • Name == undefined creates a node with a dynamic node name that is assigned by the first (non-dynamic) node it connects to.
  • When a node with a static name gets a connection from a node with dynamic name it generates a node name for it based on the presented host name.

A node's short host name is "always" known, but if a node with long name cannot figure out its fully qualified host name:

  • The domain name has to be provided by including it in the node name,
  • or the OS configuration has to be adjusted so the domain name can be figured out,
  • or the Erlang/OTP system has to be improved to work even harder to find the domain name.

This applies also to dynamic node names.

It has become increasingly harder to know the domain name since system resolvers are becoming more complex. The approach that the Erlang/OTP system has used is to mimic what platforms do, but that approach is now almost futile.

We can try to find out why your dynamically named node now suddenly cannot figure out its domain name, but the safe way for long names is to provide a fully qualified host name in the node name.

For a node with a static node name, the host name part of the node name has to be one that all other nodes can connect to, so automatic reconnection can work.

For a node with a dynamic name, it doesn't listen for connections so the host name part of the node name is never used, except for when debugging or if some application looks at it. A workaround is to make up a fully qualified host name. It can be anything you like as long as it adheres to the name rules for fully qualified host names, such as -name [email protected].

We are reluctant to change this apparently weird behaviour since there might be possible future features such as that a dynamic node can start to listen after getting its node name, and then even not be -hidden. In such cases the host name should be possible to connect to, which brings us back to the domain name problem - the connecting node has to provide its host name that has resolve and may have to be fully qualified.

That said; the code in inet_config that tries to figure out the fully qualified host name can be improved to cover more cases, which might hide this problem again for a while.

RaimoNiskanen avatar Aug 13 '25 15:08 RaimoNiskanen

@inikulshin: Ping

RaimoNiskanen avatar Aug 19 '25 12:08 RaimoNiskanen

We are closing this issue due to inactivity. Raimo has described challenges . And at the moment we are not convinced that it is worth putting effort into hiding the problem but we rather think it is worth having an improved setup.

IngelaAndin avatar Sep 16 '25 08:09 IngelaAndin