sofa-registry icon indicating copy to clipboard operation
sofa-registry copied to clipboard

SOFA-Registry should be tolerant of clock adjustments

Open ashlee618 opened this issue 2 years ago • 7 comments

Your question

At present, SOFA-Registry relies more on time, but there is no good classification for the use of time. All time dependencies in the project are on the wall-clock time System.currentTimeMillis. This may not be a good idea for distributed architectures, especially in the financial field.

Your scenes

Scenarios for calculating timeout, for example:

DataChangeEventCenter#geExpires

  List<ChangeNotifier> getExpires() {
    final List<ChangeNotifier> expires = Lists.newLinkedList();
    final long now = System.currentTimeMillis();
    synchronized (retryNotifiers) {
      final Iterator<ChangeNotifierRetry> it = retryNotifiers.iterator();
      while (it.hasNext()) {
        ChangeNotifierRetry retry = it.next();
        // Comparing with before configured timeout
        if (retry.expireTimestamp <= now) {
          expires.add(retry.notifier);
          it.remove();
        }
      }
    }
    return expires;
  }

DataChangeEventCenter#commitRetry

  boolean commitRetry(ChangeNotifier retry) {
    final int maxSize = dataServerConfig.getNotifyRetryQueueSize();
    // here to set expireTime
    final long expireTimestamp =
        System.currentTimeMillis() + dataServerConfig.getNotifyRetryBackoffMillis();
    synchronized (retryNotifiers) {
      if (retryNotifiers.size() >= maxSize) {
        // remove first
        retryNotifiers.removeFirst();
      }
      retryNotifiers.add(new ChangeNotifierRetry(retry, expireTimestamp));
    }
    return true;
  }

This kind of time-out detection using a monotone clock is better in distributed systems.

Your advice

maybe we can take a look at zookeeper and separate the time-dependent functions in the project by time point and time interval, and then choose a wall clock or a monotonic clock to solve the problem.

https://svn.apache.org/viewvc?view=revision&revision=1657745

git version :2789d1df02e697ea511867546dc569ff6b405ece

Environment

  • SOFARegistry version:
  • JVM version (e.g. java -version):
  • OS version (e.g. uname -a):
  • Maven version:
  • IDE version:

ashlee618 avatar Jun 20 '23 05:06 ashlee618

Thanks for advice, but what I see from ZK's perspective, is that, ZK do the same stuff through wall clock:

package org.apache.zookeeper.common;

import java.util.Date;

public class Time {
    /**
     * Returns time in milliseconds as does System.currentTimeMillis(),
     * but uses elapsed time from an arbitrary epoch more like System.nanoTime().
     * The difference is that if somebody changes the system clock,
     * Time.currentElapsedTime will change but nanoTime won't. On the other hand,
     * all of ZK assumes that time is measured in milliseconds.
     * @return  The time in milliseconds from some arbitrary point in time.
     */
    public static long currentElapsedTime() {
        return System.nanoTime() / 1000000;
    }

    /**
     * Explicitly returns system dependent current wall time.
     * @return Current time in msec.
     */
    public static long currentWallTime() {
        return System.currentTimeMillis();
    }

    /**
     * This is to convert the elapsedTime to a Date.
     * @return A date object indicated by the elapsedTime.
     */
    public static Date elapsedTimeToDate(long elapsedTime) {
        long wallTime = currentWallTime() + elapsedTime - currentElapsedTime();
        return new Date(wallTime);
    }
}

So, I'm gonna ask you about what do you mean by saying monotone clock? Logic clock is not capable of calculating time expiration, only wall clock does....., as well as I known, event Google Spanner could not guarantee an increasing order of clock with a atomic clock inside each machine running Spanner.

NickNYU avatar Jun 20 '23 05:06 NickNYU

Thank you for your reply. I agree with your statement, but I think you misunderstood my meaning.

I guess you're talking about sequences, a stricter sort relationship.

What I mean is that we can replace some System.currentTimeMillis with a monotonic clock like Sytem.nanoTime, so that the application can tolerate clock induced issues such as drift and clock jumping.

ashlee618 avatar Jun 20 '23 07:06 ashlee618

Thank you for your advice. I think your advice is reasonable. In some use cases of sofa registry, such as obtaining timeout tasks within a certain time interval, it is indeed difficult to avoid the impact of system clock time jumps when using System.currentTimeMillis() in the code. In this scenario, using System.nanoTime() is indeed more appropriate.

nocvalight avatar Jun 20 '23 08:06 nocvalight

please assign me.

ashlee618 avatar Jun 22 '23 05:06 ashlee618

I will try to organize their dependence on time, and do you have any good suggestions?

Is the scope of our refactoring the entire project or just some features.

ashlee618 avatar Jun 22 '23 05:06 ashlee618

please assign me.

assigned

nocvalight avatar Jun 23 '23 03:06 nocvalight

I will try to organize their dependence on time, and do you have any good suggestions?

Is the scope of our refactoring the entire project or just some features.

The scenario we discussed above is for comparing a time interval on the same machine, such as whether a push task has expired. In this scenario, I think using System.nanoTime() is appropriate. At the same time, there are also some other scenarios in SFARegistry, such as generating a DatumVersion after the datum is changed, such as the following code:

public static long confregNextId(long lastVersion) {
    long timestamp = timeGen();
    if (lastVersion < timestamp) {
      return timestamp;
    }
    if (lastVersion > timestamp + 10) {
      LOG.warn(
          "[DatumVersion] last version {} is higher than currentTimestamp {}",
          lastVersion,
          timestamp);
    }
    return lastVersion + 1;
  }

The DatumVersion may generate from different machine and needs to be compared, and in this scenario, what needs to be compared is the global time of different machines. In this scenario, using System.nanoTime() is not suitable. So, I think that our refactoring scope should be focused on scenarios where time comparison is made on the same machine.

nocvalight avatar Jun 23 '23 03:06 nocvalight