Ocelot icon indicating copy to clipboard operation
Ocelot copied to clipboard

Connection errors when Consul node names are not real DNS hosts

Open ghost opened this issue 5 years ago • 29 comments

Expected Behavior / New Feature

Ocelot.Provider.Consul version 13.5.2

Actual Behavior / Motivation for New Feature

I used Ocelot.Provider.Consul and Consul, but when I make an API request, the console display error

info: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[0]
      User profile is available. Using 'C:\Users\Aries\AppData\Local\ASP.NET\DataProtection-Keys' as key repository and Windows DPAPI to encrypt keys at rest.
Hosting environment: Development
Content root path: F:\开放平台\网关\OPEN_GATEWAY\OPEN_GATEWAY
Now listening on: http://[::]:9000
Application started. Press Ctrl+C to shut down.
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[1]
      Request starting HTTP/1.1 GET http://192.168.1.109:9000/open/values
dbug: Ocelot.Errors.Middleware.ExceptionHandlerMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: ocelot pipeline started
dbug: Ocelot.DownstreamRouteFinder.Middleware.DownstreamRouteFinderMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: Upstream url path is /open/values
dbug: Ocelot.DownstreamRouteFinder.Middleware.DownstreamRouteFinderMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: downstream templates are /api/{url}
info: Ocelot.RateLimit.Middleware.ClientRateLimitMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: EndpointRateLimiting is not enabled for /api/{url}
info: Ocelot.Authentication.Middleware.AuthenticationMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: No authentication needed for /open/values
info: Ocelot.Authorisation.Middleware.AuthorisationMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: /api/{url} route does not require user to be authorised
dbug: Ocelot.DownstreamUrlCreator.Middleware.DownstreamUrlCreatorMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: Downstream url is http://n1:9999/api/values
dbug: Ocelot.Requester.Middleware.HttpRequesterMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: IHttpRequester returned an error, setting pipeline error
warn: Ocelot.Requester.Middleware.HttpRequesterMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: Error making http request, exception: System.Net.Http.HttpRequestException: 不知道这样的主机。 ---> System.Net.Sockets.SocketException: 不知道这样的主机。
         at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
         --- End of inner exception stack trace ---
         at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
         at System.Threading.Tasks.ValueTask`1.get_Result()
         at System.Net.Http.HttpConnectionPool.CreateConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
         at System.Threading.Tasks.ValueTask`1.get_Result()
         at System.Net.Http.HttpConnectionPool.WaitForCreatedConnectionAsync(ValueTask`1 creationTask)
         at System.Threading.Tasks.ValueTask`1.get_Result()
         at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
         at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
         at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
         at Ocelot.Requester.HttpClientHttpRequester.GetResponse(DownstreamContext context)
warn: Ocelot.Responder.Middleware.ResponderMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: Error Code: UnableToCompleteRequestError Message: Error making http request, exception: System.Net.Http.HttpRequestException: 不知道这样的主机。 ---> System.Net.Sockets.SocketException: 不知道这样的主机。
         at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
         --- End of inner exception stack trace ---
         at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
         at System.Threading.Tasks.ValueTask`1.get_Result()
         at System.Net.Http.HttpConnectionPool.CreateConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
         at System.Threading.Tasks.ValueTask`1.get_Result()
         at System.Net.Http.HttpConnectionPool.WaitForCreatedConnectionAsync(ValueTask`1 creationTask)
         at System.Threading.Tasks.ValueTask`1.get_Result()
         at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
         at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
         at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
         at Ocelot.Requester.HttpClientHttpRequester.GetResponse(DownstreamContext context) errors found in ResponderMiddleware. Setting error response for request path:/open/values, request method: GET
dbug: Ocelot.Errors.Middleware.ExceptionHandlerMiddleware[0]
      requestId: 0HLNVAS1E4EHV:00000001, previousRequestId: no previous request id, message: ocelot pipeline finished
info: Microsoft.AspNetCore.Hosting.Internal.WebHost[2]
      Request finished in 3028.154ms 500

But when I use Ocelot.Provider.Consul version 13.5.0 there is no problem

Steps to Reproduce the Problem

  1. ocelot.json configuration:
    {
"Routes": [
  {
    "UseServiceDiscovery": true,
    "DownstreamPathTemplate": "/api/{url}",
    "DownstreamScheme": "http",
    "ServiceName": "OpenTestService",
    "UpstreamPathTemplate": "/open/{url}",
    "UpstreamHttpMethod": [ "POST", "GET" ]
  }
],
"GlobalConfiguration": {
  "ServiceDiscoveryProvider": {
    "Host": "192.168.1.109",
    "Port": 8500,
    "Type": "Consul"
  }
}
}

consul server configuration:

    {
  "encrypt": "7TnJPB4lKtjEcCWWjN6jSA==",
  "services": [
    {
      "id":"OPEN_TEST_01",
      "name":"OpenTestService",
      "tags":["OpenTestService"],
      "address":"192.168.1.109",
      "port":9999,
      "checks":[
        {
          "id": "OpenTest_Check",
          "name": "OpenTest_Check",
          "http": "http://192.168.1.109:9999/api/health",
          "interval": "10s",
          "tls_skip_verify": false,
          "method": "GET",
          "timeout": "1s"
        }
      ]
    }
  ]
}

Specifications

  • Version: Ocelot.Provider.Consul version 13.5.2
  • Platform: windows 10
  • Subsystem:

ghost avatar Jul 03 '19 01:07 ghost

My framework is. NetCore 2.2 I think it's a bug. No problem with 13.5.1. Or I don't know how to write code.

dafeifei0218 avatar Jul 26 '19 06:07 dafeifei0218

楼主,这个问题你解决了吗?

winlj avatar Aug 09 '19 07:08 winlj

楼主,这个问题你解决了吗?

还没有解决,目前恢复到了13.5.0版本在使用。一直没有时间去解决这个问题,项目比较紧急。

ghost avatar Aug 09 '19 08:08 ghost

这是7月一次#909,支持SSL认证加入的处理。如果有node,下游地址使用node.name。但是dns肯定不认识的。我也反映了下,不知道这边应该怎么做,consul本身是有dns服务器的

daisen avatar Sep 18 '19 02:09 daisen

这是7月一次#909,支持SSL认证加入的处理。如果有node,下游地址使用node.name。但是dns肯定不认识的。我也反映了下,不知道这边应该怎么做,consul本身是有dns服务器的

这个问题你解决了吗?

winlj avatar Sep 24 '19 05:09 winlj

这是7月一次#909,支持SSL认证加入的处理。如果有node,下游地址使用node.name。但是dns肯定不认识的。我也反映了下,不知道这边应该怎么做,consul本身是有dns服务器的

这个问题你解决了吗?

降版本了

daisen avatar Sep 24 '19 08:09 daisen

这个问题我尝试着解决了一下,我也不会英语. 13.5.1的版本或者以前的版本 ocelot都是读取的 consul 客户端节点的ip地址,但是13.5.2以后的 版本就开始读取consul 客户端节点的node名称,由于我是centos系统,我在run consul 非server端的时候 我把节点名称设置成了 计算机的hostname名称,然后在centos里把所有的hostname跟ip进行了绑定

经过以上的处理,就解决了这个问题,希望能到你们帮助.

winlj avatar Sep 24 '19 09:09 winlj

补充一下,如果大家使用的是docker容器,一般就不会有这种情况发生.

winlj avatar Sep 24 '19 09:09 winlj

这个问题我尝试着解决了一下,我也不会英语. 13.5.1的版本或者以前的版本 ocelot都是读取的 consul 客户端节点的ip地址,但是13.5.2以后的 版本就开始读取consul 客户端节点的node名称,由于我是centos系统,我在run consul 非server端的时候 我把节点名称设置成了 计算机的hostname名称,然后在centos里把所有的hostname跟ip进行了绑定

经过以上的处理,就解决了这个问题,希望能到你们帮助.

改主机名是没问题的,这跟ocelot无关,只是dns根据hostname能够解析出主机ip

daisen avatar Sep 24 '19 09:09 daisen

但是我感觉改主机名还是不对的,不应该这样,ocelot既然能访问,还是应该按照以前的ip方式去访问才对.

winlj avatar Sep 24 '19 09:09 winlj

不知道为什么13.5.1以后的版本会这样处理,不知道为什么

winlj avatar Sep 24 '19 09:09 winlj

但是我感觉改主机名还是不对的,不应该这样,ocelot既然能访问,还是应该按照以前的ip方式去访问才对.

看pr是为了解决ssl问题,看作者最后怎么支持了。目前强制使用node name的方式肯定是不支持有些场景。原则上consul支持dns服务,可以从这个角度入手。但是也不应该是node name而是按照consul的服务名+后缀的方式。如果consul的node接口不返回信息,也是使用ip的方式

daisen avatar Sep 24 '19 09:09 daisen

但是我感觉改主机名还是不对的,不应该这样,ocelot既然能访问,还是应该按照以前的ip方式去访问才对.

看pr是为了解决ssl问题,看作者最后怎么支持了。目前强制使用node name的方式肯定是不支持有些场景。原则上consul支持dns服务,可以从这个角度入手。但是也不应该是node name而是按照consul的服务名+后缀的方式。如果consul的node接口不返回信息,也是使用ip的方式

对的,看后续作者如何处理了.我就先这么用着了.

winlj avatar Sep 24 '19 09:09 winlj

We can’t understand to help you guys.

thiagoloureiro avatar Sep 24 '19 09:09 thiagoloureiro

We can’t understand to help you guys.

ocelot with consul dynamic reroute use node name instead of service host ip after pr (#909). but the node name is not always host name. so our code get error of host error of dns.

daisen avatar Sep 24 '19 09:09 daisen

现在的版本还有这个问题,大家怎么解决的啊? 我修改hosts文件没成功.所以就投机把node设置成127.0.0.1,可以运行.

yiyecao avatar Jul 17 '20 06:07 yiyecao

@daisen commented on Sep 24, 2019

Could you upgrade to the latest v22.0.1 please? Could you recheck once again that the bug still persists?


ocelot with consul dynamic reroute use node name instead of service host IP after pr (https://github.com/ThreeMammals/Ocelot/pull/909), but the node name is not always host name. So our code get error of host error of DNS.

:ok: What's your vision of fixing it? Do you have a plan to fix? Probably we have to explain to developer in docs that they have to use real DNS hostnames in Consul nodes?

raman-m avatar Jan 27 '24 13:01 raman-m

It seems the author has deleted his(er) GitHub account... Cannot communicate...

@ggnaegi Could you consult us here plz?

raman-m avatar Jan 27 '24 13:01 raman-m

@raman-m it's very old...

ggnaegi avatar Jan 27 '24 21:01 ggnaegi

"2 days ago", great! :D I've encountered this bug back in 2022 and again, just 3 days ago, still need to migrate one service and it will help a lot. My issue is that before, Ocelot used configured IP address to call the service, now it uses consul node name? or machine name? and it breaks work on local stations. I can probably somehow configure it on server but locally I really liked the way of resolving addresses from version 13.0.0 (before this commit I guess: https://github.com/ThreeMammals/Ocelot/commit/b707cd6175de25e4ac225f81ca663c89d7f10654)

What's your vision of fixing it? Do you have a plan to fix?

Maybe some backwards compability?

ignacy130 avatar Jan 29 '24 09:01 ignacy130

@ignacy130 Are you the author (@ghost)? 😺 It will be difficult to ensure backward compatibility for all versions of Ocelot. Software products evolve at their own pace and have breaking changes recorded in Release Notes. But I agree this commit https://github.com/ThreeMammals/Ocelot/commit/b707cd6175de25e4ac225f81ca663c89d7f10654 is strange, and it solves the problem of rare user scenario with Consul setup. So, host names resolving logic could be better...

Will you personally contribute to help us to fix this design issue?

raman-m avatar Jan 29 '24 09:01 raman-m

@ggnaegi Agree, it is very old but the problem still persists in commit https://github.com/ThreeMammals/Ocelot/commit/b707cd6175de25e4ac225f81ca663c89d7f10654 Bad design. https://github.com/ThreeMammals/Ocelot/blob/f4803c24bf9e9ca3929c78ca8eb23401e3c31c23/src/Ocelot.Provider.Consul/Consul.cs#L59-L61

Probably we have to decouple hosts/addresses getting and introduce some DNS probes... So, it can be separate service class with injection to the Ocelot.Provider.Consul.Consul class constructor.

Second,

Do you write service.Address or serviceNode.Name in your Prod Consul configuration?

raman-m avatar Jan 29 '24 09:01 raman-m

I register my services with newest Consul (https://consuldot.net/) nuget. I can't get app to work without specifying AgentServiceRegistration.Address (backend can't resolve address - I get "Not Found" from Ocelot) and can't register in consul without AgentServiceRegistration.Name.

It's with temporary workaround where I use 127.0.0.1 as a node name for my Consul. I get 502 Bad Gateway, without workaround, registering with both Address and Name. I get 404 Not Found, without workaround, registering without Address, just with Name.

ignacy130 avatar Jan 29 '24 11:01 ignacy130

@ignacy130 I agree, it is significant problem for different Consul configurations and Consul setup scenarios. We need to work on this to make the logic more flexible. I've added Feb'24 label to work on this more in February.

Could you fork Ocelot repo into your account please? This link should work... And you can make quick fix by your own rewriting the BuildService static method. https://github.com/ThreeMammals/Ocelot/blob/f4803c24bf9e9ca3929c78ca8eb23401e3c31c23/src/Ocelot.Provider.Consul/Consul.cs#L54-L65 And let me know your ideas how we could refactor the class plz

raman-m avatar Jan 30 '24 08:01 raman-m

@ignacy130 Do you use Dynamic Routing or classic Consul setup (Consul routes + static routes)?

And I've just realized that your problem can be solved by Custom Providers... Just copy Consul class code, rename the class, and develop GetAsync method for your needs... Attach this custom provider type in your Consul routes. Bingo! Sounds good?

raman-m avatar Jan 30 '24 08:01 raman-m

I don't use dynamic routing, just classic setting. Once I've started debugging I thought about overriding this Consul.cs somehow and Custom Providers seem like a good solution - I'll have a look at this!

ignacy130 avatar Jan 30 '24 08:01 ignacy130

@ignacy130 Good luck in development! Please let us know the code and test results once the solution of custom provider is ready. We'll discuss your contribution plans, when you'll be free...

raman-m avatar Jan 30 '24 09:01 raman-m

Got it working, thank you! The more complicated way of configuring custom provider worked better for me.

Name of the class is stupid maybe but works for now XD

ocelot.json:

"GlobalConfiguration": {
    "ServiceDiscoveryProvider": {
      "Type": "ConsulProviderResolvingAddressNotConsulNodeName"
    }
  }

Startup.cs:

services.RemoveAll<IServiceDiscoveryProviderFactory>();
services.AddSingleton<IServiceDiscoveryProviderFactory, CustomConsulProviderFactory>();
services.AddSingleton<ServiceDiscoveryFinderDelegate>((serviceProvider, config, downstreamRoute) => null);

services.AddSingleton<IConsulClientFactory>(new ConsulClientFactory());

services.AddOcelot(_cfg).AddConfigStoredInConsul(); //Note: no .AddConsul() call!

The rest i basically a copy-paste from old way (13.0.0) of resolving services via Consul and provider factory:

public class ConsulProviderResolvingAddressNotConsulNodeName : IServiceDiscoveryProvider
{
    private const string VersionPrefix = "version-";
    private readonly ConsulRegistryConfiguration _config;
    private readonly IConsulClient _consul;
    private readonly IOcelotLogger _logger;

    public ConsulProviderResolvingAddressNotConsulNodeName(ConsulRegistryConfiguration config, IOcelotLoggerFactory factory, IConsulClientFactory clientFactory)
    {
        _config = config;
        _consul = clientFactory.Get(_config);
        _logger = factory.CreateLogger<ConsulProviderResolvingAddressNotConsulNodeName>();
    }

    public async Task<List<Service>> GetAsync()
    {
        var queryResult = await _consul.Health.Service(_config.KeyOfServiceInConsul, string.Empty, false);

        var services = new List<Service>();

        foreach (var serviceEntry in queryResult.Response)
        {
            if (IsValid(serviceEntry))
            {
                services.Add(BuildService(serviceEntry));
            }
            else
            {
                _logger.LogWarning($"Unable to use service Address: {serviceEntry.Service.Address} and Port: {serviceEntry.Service.Port} as it is invalid. Address must contain host only e.g. localhost and port must be greater than 0");
            }
        }

        return services.ToList();
    }

    private Service BuildService(ServiceEntry serviceEntry)
    {
        return new Service(
            serviceEntry.Service.Service,
            new ServiceHostAndPort(serviceEntry.Service.Address, serviceEntry.Service.Port),
            serviceEntry.Service.ID,
            GetVersionFromStrings(serviceEntry.Service.Tags),
            serviceEntry.Service.Tags ?? Enumerable.Empty<string>());
    }

    private bool IsValid(ServiceEntry serviceEntry)
    {
        if (string.IsNullOrEmpty(serviceEntry.Service.Address) || serviceEntry.Service.Address.Contains("http://") || serviceEntry.Service.Address.Contains("https://") || serviceEntry.Service.Port <= 0)
        {
            return false;
        }

        return true;
    }

    private string GetVersionFromStrings(IEnumerable<string> strings)
    {
        return strings
            ?.FirstOrDefault(x => x.StartsWith(VersionPrefix, StringComparison.Ordinal))
            .TrimStart(VersionPrefix);
    }
}

public class CustomConsulProviderFactory : IServiceDiscoveryProviderFactory
{
    /// <summary>
    /// String constant used for provider type definition.
    /// </summary>
    public const string PollConsul = nameof(ConsulProviderResolvingAddressNotConsulNodeName);

    private static readonly List<PollConsul> ServiceDiscoveryProviders = new();
    private static readonly object LockObject = new();
    private IOcelotLoggerFactory _factory;
    private IServiceProvider _provider;

    public CustomConsulProviderFactory(IOcelotLoggerFactory factory, IServiceProvider provider)
    {
        _factory = factory;
        _provider = provider;
    }

    private IServiceDiscoveryProvider CreateProvider(
        ServiceProviderConfiguration config, DownstreamRoute route)
    {
        var factory = _provider.GetService<IOcelotLoggerFactory>();
        var consulFactory = _provider.GetService<IConsulClientFactory>();

        var consulRegistryConfiguration = new ConsulRegistryConfiguration(
            config.Scheme, config.Host, config.Port, route.ServiceName, config.Token);

        var consulProvider = new ConsulProviderResolvingAddressNotConsulNodeName(consulRegistryConfiguration, factory, consulFactory);

        if (PollConsul.Equals(config.Type, StringComparison.OrdinalIgnoreCase))
        {
            lock (LockObject)
            {
                var discoveryProvider = ServiceDiscoveryProviders.FirstOrDefault(x => x.ServiceName == route.ServiceName);
                if (discoveryProvider != null)
                {
                    return discoveryProvider;
                }

                discoveryProvider = new PollConsul(config.PollingInterval, route.ServiceName, factory, consulProvider);
                ServiceDiscoveryProviders.Add(discoveryProvider);
                return discoveryProvider;
            }
        }

        return consulProvider;
    }

    Response<IServiceDiscoveryProvider> IServiceDiscoveryProviderFactory.Get(ServiceProviderConfiguration serviceConfig, DownstreamRoute route)
    {
        return new OkResponse<IServiceDiscoveryProvider>(CreateProvider(serviceConfig, route));
    }
}

ignacy130 avatar Jan 30 '24 11:01 ignacy130

@ignacy130 Hi Ignacy! How do you do? Did you work on the issue having some draft solution? Will you open a PR soon?

@ggnaegi FYI The problem is closely related to #1967, #2052 but for the Consul SD provider...

raman-m avatar Apr 24 '24 10:04 raman-m