spring-cloud-netflix icon indicating copy to clipboard operation
spring-cloud-netflix copied to clipboard

EIP publicip association not correctly updated on fresh instance

Open nick-pww opened this issue 8 years ago • 23 comments

I've been directed over here from the eureka folks, as they believe this should just 'work'. Have the following issue running off spring-cloud-netflix:1.1.4.RELEASE. The issue I opened over there is: https://github.com/Netflix/eureka/issues/840

There seems to be a problem with public EIP address association not being correctly updated when a new AWS server starts and has a new Eureka server starting with it. When the server starts up, it correctly registers itself:

2016-09-06 15:55:29.040  WARN 3399 --- [Thread-10] com.netflix.eureka.aws.EIPManager        : The selected EIP 54.67.102.122 is associated with another instance i-0666b391 according to AWS, hence skipping this
2016-09-06 15:55:29.628  INFO 3399 --- [Thread-10] com.netflix.eureka.aws.EIPManager        :


Associated i-25f11391 running in zone: us-west-1c to elastic IP: X.X.X.X

But, every minute after that we get the following log entry:

2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Got 1 instances from neighboring DS node
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Renew threshold is: 1
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Priming AWS connections for all replicas..
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : No peers needed to prime.
2016-09-06 16:24:55.568  INFO 3399 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Changing status to UP
2016-09-06 16:24:55.713  WARN 3399 --- [Eureka-EIPBinder] com.netflix.eureka.aws.EIPManager        : The selected EIP X.X.X.X is associated with another instance i-0666b391 according to AWS, hence skipping this
2016-09-06 16:24:55.804  INFO 3399 --- [Eureka-EIPBinder] com.netflix.eureka.aws.EIPManager        : My instance i-25f11391 seems to be already associated with the EIP X.X.X.X

Debugging this, the call to isEIPBound() is always failing, and this is because the following is always null:

String myPublicIP = ((AmazonInfo) myInfo.getDataCenterInfo()).get(MetaDataKey.publicIpv4);

It looks like there is stale datacenterinfo and it never gets refreshed (from what I can tell) and there there are no settings available to have it refreshed automatically.

The odd side affect of this, and we noticed, is that the registry continually gets wiped, and reset causing obvious potential issues down stream for our clients.

I have been trying to find where this datacenter info might be refreshed, but am unable to find anything that might actually do that.

The deployed app only has a single main class in it:

@SpringBootApplication
@EnableEurekaServer
@EnableAutoConfiguration
public class EurekaServer {

    @Value("${server.port}")
    private Integer nonSecurePort;
    @Autowired
    private InetUtils utils;

    public static void main(String[] args) {
        new SpringApplicationBuilder(EurekaServer.class).web(true).run(args);
    }

    @Bean
    @Profile("aws")
    public EurekaInstanceConfigBean awsEurekaConfig() {
        EurekaInstanceConfigBean b = new EurekaInstanceConfigBean(utils);
        b.setNonSecurePort(nonSecurePort);
        b.setSecurePortEnabled(false);
        AmazonInfo info = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
        b.setDataCenterInfo(info);
        return b;
    }

}

nick-pww avatar Sep 06 '16 18:09 nick-pww

Interesting. I assume this is running on AWS? What is the configuration?

spencergibb avatar Sep 06 '16 18:09 spencergibb

Yes, running on AWS. Here are the relevant configs (coming from spring-cloud config server): Global config for all apps:

eureka.instance.leaseRenewalIntervalInSeconds=30
eureka.client.healthcheck.enabled=true
eureka.datacenter=cloud

Config for just the server apps:

eureka:
    client:
        registerWithEureka: false
        fetchRegistry: false

And servers have:

eureka.client.serviceUrl.defaultZone=....

setup as well with the relevant EIPs assigned.

nick-pww avatar Sep 06 '16 18:09 nick-pww

@nick-pww I just noticed your config. The thread that DiscoveryClient uses to refresh local instanceInfo (and hence datacenterInfo) is only started if registerWithEureka is true (it tries to save the extra cpu resource if registration is not configured). Is there a reason you are configured with register = false?

qiangdavidliu avatar Sep 06 '16 18:09 qiangdavidliu

@qiangdavidliu Going off several examples and docs. One of which is here: https://spring.io/guides/gs/service-registration-and-discovery/

I can turn that off, but one problem I had before that with that and 'fetchRegistry' on was that the servers were essentially always 'registering' applications even if they were no longer up because it was getting info from the other eureka servers. Basically, applications would never unregister, and if they did, they had a good chance of coming back when the servers synced again.

Also, I've read in other places that having the server register with itself can make the 'renew' threshold act oddly in some cases.

Will try to re-enable just that option and see what happens.

nick-pww avatar Sep 06 '16 19:09 nick-pww

Also from https://github.com/Netflix/eureka/issues/840#issuecomment-245052062 (typo fixed)

Note that the Amazon based datacenter info refreshes in ApplicationInfoManager only occurs if the config is of CloudInstanceConfig.

Our config isn't a CloudInstanceConfig

spencergibb avatar Sep 06 '16 19:09 spencergibb

@nick-pww those guides are for single instance eureka's, production should be a peered cluster, see #1251.

spencergibb avatar Sep 06 '16 19:09 spencergibb

@spencergibb It's not really clear that those are 'development' only options that should be set. Would recommend that a large note or something goes in there stating such.

@qiangdavidliu + @spencergibb I've changed the config but still have the same issue with new instances. I'm still getting the:

2016-09-06 19:44:15.541  INFO 25015 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Renew threshold is: 1
2016-09-06 19:44:15.541  INFO 25015 --- [Eureka-EIPBinder] c.n.e.r.PeerAwareInstanceRegistryImpl    : Priming AWS connections for all replicas..

messages, and it's still resetting every minute. Both servers are registering with each other and show up in the list of applications, but the one where I cleared the EIP and restarted is exhibiting this still, while the one that I didn't seems to be working as expected.

(new config edit)

eureka:
    client:
        registerWithEureka: true
        fetchRegistry: false

nick-pww avatar Sep 06 '16 19:09 nick-pww

I am actually struggling with the exact same issue. Explicitly setting hostname and IP address in the EurekaInstanceConfigBean @Bean is also not working:

        eurekaInstanceConfig.setIpAddress(info.get(AmazonInfo.MetaDataKey.publicIpv4));
        eurekaInstanceConfig.setHostname(info.get(AmazonInfo.MetaDataKey.publicHostname));

as this bean seems to be initialized before EIPManager binds an EIP address and so both values are null. The lame hack so far is that I listen to EurekaRegistryAvailableEvent and restart the application if EurekaInstanceConfigBean.getHostname() is null as the second time around the EIP is already bound to the aws instance and it all works...

florind avatar Sep 07 '16 11:09 florind

@spencergibb at Netflix we use the CloudInstanceConfig that has the ability to refresh the underlying AmazonInfo. Does the spring cloud configs do similar?

qiangdavidliu avatar Sep 07 '16 17:09 qiangdavidliu

@qiangdavidliu no it doesn't :-(

spencergibb avatar Sep 07 '16 17:09 spencergibb

It extends PropertiesInstanceConfig and we use boot @ConfigurationProperties to load properties so we needed a different class, but since it implemented an interface EurekaInstanceConfig when we started it was ok. I wonder if we could break the business logic out into a separate class that get's injected so we could reuse it? We can always copy/paste.

spencergibb avatar Sep 07 '16 17:09 spencergibb

Let me see what I can do on that.

qiangdavidliu avatar Sep 07 '16 17:09 qiangdavidliu

thanks!

spencergibb avatar Sep 07 '16 17:09 spencergibb

This works for us:

@Configuration
@Slf4j
@ConditionalOnAwsCloudEnvironment
@EnableContextInstanceData
@Import(UtilAutoConfiguration.class)
@AutoConfigureAfter(UtilAutoConfiguration.class)
public class AwsInstanceConfig {

    @Value("${server.port:${SERVER_PORT:${PORT:8080}}}")
    int nonSecurePort;

    @Value("${management.port:${MANAGEMENT_PORT:${server.port:${SERVER_PORT:${PORT:8080}}}}}")
    int managementPort;

    @Value("${eureka.instance.hostname:${EUREKA_INSTANCE_HOSTNAME:}}")
    String hostname;

    @Autowired
    ConfigurableEnvironment env;


    @Bean
    public EurekaInstanceConfigBean eurekaInstanceConfigBean(InetUtils utils) {
        log.info("Setting AmazonInfo on EurekaInstanceConfigBean");
        final EurekaInstanceConfigBean instance = new EurekaInstanceConfigBean(utils) {

            @Scheduled(initialDelay = 30000L, fixedRate = 30000L)
            public void refreshInfo() {
                log.debug("Checking datacenter info changes");
                AmazonInfo newInfo = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
                if (!this.getDataCenterInfo().equals(newInfo)) {
                    log.info("Updating datacenterInfo to {}", newInfo);
                    ((AmazonInfo) this.getDataCenterInfo()).setMetadata(newInfo.getMetadata());
                }
            }

            private AmazonInfo getAmazonInfo() {
                return (AmazonInfo) getDataCenterInfo();
            }

            @Override
            public String getHostname() {
                AmazonInfo info = getAmazonInfo();
                final String publicHostname = info.get(AmazonInfo.MetaDataKey.publicHostname);
                return this.isPreferIpAddress() ?
                    info.get(AmazonInfo.MetaDataKey.localIpv4) :
                    publicHostname == null ?
                        info.get(AmazonInfo.MetaDataKey.localHostname) : publicHostname;
            }

            @Override
            public String getHostName(final boolean refresh) {
                return getHostname();
            }

            @Override
            public String getHomePageUrl() {
                return super.getHomePageUrl();
            }

            @Override
            public String getStatusPageUrl() {
                String scheme = getSecurePortEnabled() ? "https" : "http";
                return scheme + "://" + getHostname() + ":"
                    + managementPort + getStatusPageUrlPath();
            }

            @Override
            public String getHealthCheckUrl() {
                String scheme = getSecurePortEnabled() ? "https" : "http";
                return scheme + "://" + getHostname() + ":"
                    + managementPort + getHealthCheckUrlPath();
            }
        };
        AmazonInfo info = AmazonInfo.Builder.newBuilder().autoBuild("eureka");
        log.info("Info: {}", info);
        instance.setDataCenterInfo(info);
        instance.setNonSecurePort(this.nonSecurePort);
        instance.setInstanceId(getDefaultInstanceId(this.env));
        if (this.managementPort != this.nonSecurePort && this.managementPort != 0) {
            if (StringUtils.hasText(this.hostname)) {
                instance.setHostname(this.hostname);
            }
        }

        return instance;
    }

}

I.e. we do a scheduled check on whether the datacenterinfo has been updated, and reset it in that case. I'm sure there's room for cleanup here, but maybe it's a start?

herder avatar Sep 09 '16 09:09 herder

@herder Netflix devs have moved the functionality to a shared class that we will be able to leverage. https://github.com/Netflix/eureka/pull/843

spencergibb avatar Sep 09 '16 17:09 spencergibb

This depends on #1345

spencergibb avatar Oct 05 '16 21:10 spencergibb

Can't wait to get this released.

elnur avatar Oct 23 '16 12:10 elnur

Many thanks to @herder for the suggested auto-refresh hack; working great for me.

I can't quite work out when the Eureka 1.6 upgrade will appear, will it be in the Dalston release train?

It's far too long to read but I've documented my experiments here - let me know if I've made any blunders

Edit to add that the OP noticed that not doing this refresh causes the registry to be wiped; I had the opposite experience that instances never get expired (it's not self preservation!). I can't think how that could be the case, so I'd be interested if anyone has any insight.

DickChesterwood avatar Feb 09 '17 18:02 DickChesterwood

thanks @DickChesterwood. 1.6 is part of Dalston. See spring-cloud-release/milestones

spencergibb avatar Feb 09 '17 18:02 spencergibb

Lovely thanks Spencer!

DickChesterwood avatar Feb 09 '17 18:02 DickChesterwood

@spencergibb Is this still an issue? I'm experiencing the same issue using Edgware.RELEASE. Is the scheduled task workaround still necessary?

gadamsciv avatar Apr 10 '18 17:04 gadamsciv

@gadamsciv it is still open, so yes.

spencergibb avatar Apr 10 '18 18:04 spencergibb

FYI, I came across this question as well, and I tried to add the scheduled task to refresh instance info. But the task doesn't start. At last, found out that if the scheduled task is in a configuration class, need to add the annotation EnableScheduling to run the task.

harmoney-ryanli avatar Mar 07 '22 22:03 harmoney-ryanli