mercury
mercury copied to clipboard
NA OFI: hostname not working on debian loopback interface
My objective is to make the server run on a specific port. Seems like explicitly specifying the address to use should work. However, it does not work is I use hostname instead of IP address or interface name when specifying the address for the server.
#include <unistd.h>
#include <thallium.hpp>
#include <string>
namespace tl = thallium;
/**
* This is the ParaT server executable.
*/
int main(int argc, char* argv[])
{
char buffer[256];
gethostname(buffer, 256);
std::ostringstream str;
str << "tcp://" << buffer << ":11111";
tl::engine myEngine(str.str(), THALLIUM_SERVER_MODE);
std::cout << "requested address: " << str.str() << std::endl;
std::cout << "server running at address: " << myEngine.self() << std::endl;
return 0;
}
The output from this is as follows:
requested address: tcp://miron:11111
server running at address: ofi+tcp;ofi_rxm://192.168.1.73:33499
Platform (please complete the following information):
- System description: Ubuntu 20.04.3 LTS
- Compiler version: GCC 9.3.0
- Plugin and protocol used [e.g. ofi, psm2]
- Dependency version: libfabric-1.13.0
@utkarshayachit PR #537 should fix your issue, I just merged it to master. Would you be able to try it out?
thanks for the fix...it still fails for me; although now I get an error message rather that it just picking a random port.
Here's the output from the same test code from the issue.
# [31877.645652] mercury->fatal: [error] /tmp/utkarsh/spack-stage/spack-stage-mercury-master-tubj37zytqb2mcdmpbrhiqe4mvbexaro/spack-src/src/na/na_ofi.c:2034
# na_ofi_domain_open(): No provider found for "tcp;ofi_rxm" provider on domain "miron"
[error] Could not initialize hg_class
terminate called after throwing an instance of 'thallium::margo_exception'
what(): [/home/utkarsh/Kitware/Mochi/spack/opt/spack/linux-ubuntu20.04-broadwell/gcc-9.3.0/mochi-thallium-develop-yufqz3w5lpooqe2rmafnxkynt7mr4kyc/include/thallium/engine.hpp:180][margo_init_ext] Could not initialize Margo
Thanks. Alright I think we'll get to the bottom of it though. Can you please export HG_LOG_LEVEL=warning HG_LOG_SUBSYS=na
and rerun your test, that will give us more details. Also the output of fi_info on your system would be helpful. I am not entirely sure yet until I see the logs but it looks like your hostname cannot be resolved for some reason.
Also thinking more about it, you might want to double check also that your /etc/hosts
does not associate your hostname to an interface that is down or something like that. I can't really think of anything else that would be significantly different on your system compared to the ones we use.
here are the results
# [481.582259] mercury->na: [error] /tmp/utkarsh/spack-stage/spack-stage-mercury-master-tubj37zytqb2mcdmpbrhiqe4mvbexaro/spack-src/src/na/na_ip.c:212
# na_ip_check_interface(): No ifa_name match found for IP
# [481.582289] mercury->cls: [warning] /tmp/utkarsh/spack-stage/spack-stage-mercury-master-tubj37zytqb2mcdmpbrhiqe4mvbexaro/spack-src/src/na/na_ofi.c:3762
# na_ofi_initialize(): Could not find matching interface for miron, attempting to use it as domain name
# [481.583238] mercury->fatal: [error] /tmp/utkarsh/spack-stage/spack-stage-mercury-master-tubj37zytqb2mcdmpbrhiqe4mvbexaro/spack-src/src/na/na_ofi.c:2034
# na_ofi_domain_open(): No provider found for "tcp;ofi_rxm" provider on domain "miron"
# [481.583258] mercury->cls: [error] /tmp/utkarsh/spack-stage/spack-stage-mercury-master-tubj37zytqb2mcdmpbrhiqe4mvbexaro/spack-src/src/na/na_ofi.c:3834
# na_ofi_initialize(): Could not open domain for tcp;ofi_rxm, miron
# [481.583270] mercury->cls: [error] /tmp/utkarsh/spack-stage/spack-stage-mercury-master-tubj37zytqb2mcdmpbrhiqe4mvbexaro/spack-src/src/na/na.c:339
# NA_Initialize_opt(): Could not initialize plugin
[error] Could not initialize hg_class
terminate called after throwing an instance of 'thallium::margo_exception'
what(): [/home/utkarsh/Kitware/Mochi/spack/opt/spack/linux-ubuntu20.04-broadwell/gcc-9.3.0/mochi-thallium-develop-yufqz3w5lpooqe2rmafnxkynt7mr4kyc/include/thallium/engine.hpp:180][margo_init_ext] Could not initialize Margo
fish: “env HG_LOG_LEVEL=warning HG_LOG…” terminated by signal SIGABRT (Abort)
cat /etc/hosts
127.0.0.1 view-localhost
127.0.0.1 localhost
127.0.1.1 miron
Thanks, I believe the third line in your /etc/hosts
file is causing the issues you're having. You should either remove it or have your permanent IP address assigned instead as documented there: https://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_hostname_resolution
Having said that, we should probably also be able to support this type of loopback IP, I'll have a look to see if we can also do that. In that case I'd expect you to have miron:11111
resolved as 127.0.1.1:11111
, probably not the IP you'd want anyway but we should somehow support it.
FWIW, removing it does indeed seem to solve the issue
Great, thanks for confirming.