spring-cloud-netflix icon indicating copy to clipboard operation
spring-cloud-netflix copied to clipboard

eureka server replication Exception: Read timed out

Open lc0138 opened this issue 4 years ago • 2 comments

Given: 6 instances of org.springframework.cloud:spring-cloud-netflix-eureka-server:2.2.4.RELEASE (each 8 GB RAM, 4 CPU)

eureka.server.peer-node-connect-timeout-ms=20000
eureka.server.peer-node-read-timeout-ms=20000

When: 7000+ instance

Then: Eureka stucks during sync with other eureka nodes. Busy threads graphic reaches its peak. Eureka's CPU usage is 90+% and clients got Timeout exceptions on connect. And it stucks forever.

Exception: eureka.cluster.ReplicationTaskProcessor It seems to be a socket read timeout exception, it will retry later. if it continues to happen and some eureka node occupied all the cpu time, you should set property 'eureka.server.peer-node-read-timeout-ms' to a bigger value com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out

Analysis: https://github.com/spring-cloud/spring-cloud-netflix/blob/v2.2.4.RELEASE/spring-cloud-netflix-eureka-server/src/main/java/org/springframework/cloud/netflix/eureka/server/InstanceRegistry.java

	@Override
	public boolean renew(final String appName, final String serverId,
			boolean isReplication) {
		log("renew " + appName + " serverId " + serverId + ", isReplication {}"
				+ isReplication);
		List<Application> applications = getSortedApplications();
		for (Application input : applications) {
			if (input.getName().equals(appName)) {
				InstanceInfo instance = null;
				for (InstanceInfo info : input.getInstances()) {
					if (info.getId().equals(serverId)) {
						instance = info;
						break;
					}
				}
				publishEvent(new EurekaInstanceRenewedEvent(this, appName, serverId,
						instance, isReplication));
				break;
			}
		}
		return super.renew(appName, serverId, isReplication);
	}

The getSortedApplications() method takes very long time to execute. Our temporary solution: the getSortedApplications method is not executed and the EurekaInstanceRenewedEvent event is not issued.

	@Override
	public boolean renew(final String appName, final String serverId,
			boolean isReplication) {
		log("renew " + appName + " serverId " + serverId + ", isReplication {}"
				+ isReplication);
		return super.renew(appName, serverId, isReplication);
	}

Do you have a better way? Thank you!

#3608

lc0138 avatar Apr 09 '21 14:04 lc0138

Any progress?

kworkbee avatar Jan 31 '23 00:01 kworkbee

@OlgaMaciaszek Hi, do you still work on this issue?

ashitikov-bld avatar Aug 02 '23 15:08 ashitikov-bld