icu-dotnet icon indicating copy to clipboard operation
icu-dotnet copied to clipboard

Making a thread safe Collator

Open NightOwl888 opened this issue 4 years ago • 3 comments

Is your feature request related to a problem? Please describe.

This is sort of a question/feature request.

First of all, it seems there is a difference in design between icu4c and icu4j. The collator documentation clearly states that

ICU Collator instances cannot be shared among threads. You should open them instead, and use a different collator for each separate thread. The safe clone function is supported for cloning collators in a thread-safe fashion.

However, the component I am porting that uses the icu4j RuleBasedCollator is able to store an instance as a private variable in a class and have that variable be hit by multiple threads at once.

I tried using lock (thisObj), but somehow the Collator knows that the wrong thread is calling it even if the call itself is synchronized. The only thing that works is creating a clone per thread, which is no help if you are not the one instantiating the threads.

Describe the solution you'd like

What I would like to see is a thread safe Collator and RuleBasedCollator implementation out of the box. I was able to make this happen on my first attempt, but I am not sure how this will perform.

internal class ThreadSafeCollator : Collator
{
    private readonly Collator collator;
    private readonly object syncLock = new object();

    public ThreadSafeCollator(Collator collator)
    {
        this.collator = (Collator)collator.Clone();
    }

    public override CollationStrength Strength
    {
        get
        {
            using (var helper = new CollatorThreadHelper<CollationStrength>(
                this.collator, (clone) => clone.Strength))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.Strength = value;
            }
        }
    }

    public override NormalizationMode NormalizationMode
    {
        get
        {
            using (var helper = new CollatorThreadHelper<NormalizationMode>(
                this.collator, (clone) => clone.NormalizationMode))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.NormalizationMode = value;
            }
        }
    }

    public override FrenchCollation FrenchCollation
    {
        get
        {
            using (var helper = new CollatorThreadHelper<FrenchCollation>(
                this.collator, (clone) => clone.FrenchCollation))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.FrenchCollation = value;
            }
        }
    }

    public override CaseLevel CaseLevel
    {
        get
        {
            using (var helper = new CollatorThreadHelper<CaseLevel>(
                this.collator, (clone) => clone.CaseLevel))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.CaseLevel = value;
            }
        }
    }

    [Obsolete]
    public override HiraganaQuaternary HiraganaQuaternary
    {
        get
        {
            using (var helper = new CollatorThreadHelper<HiraganaQuaternary>(
                this.collator, (clone) => clone.HiraganaQuaternary))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.HiraganaQuaternary = value;
            }
        }
    }

    public override NumericCollation NumericCollation
    {
        get
        {
            using (var helper = new CollatorThreadHelper<NumericCollation>(
                this.collator, (clone) => clone.NumericCollation))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.NumericCollation = value;
            }
        }
    }

    public override CaseFirst CaseFirst
    {
        get
        {
            using (var helper = new CollatorThreadHelper<CaseFirst>(
                this.collator, (clone) => clone.CaseFirst))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.CaseFirst = value;
            }
        }
    }

    public override AlternateHandling AlternateHandling
    {
        get
        {
            using (var helper = new CollatorThreadHelper<AlternateHandling>(
                this.collator, (col) => col.AlternateHandling))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.AlternateHandling = value;
            }
        }
    }

    public override object Clone()
    {
        // No need to use our helper here...
        return this.collator.Clone();
    }

    public override int Compare(string source, string target)
    {
        using (var helper = new CollatorThreadHelper<int>(
                this.collator, (col) => col.Compare(source, target)))
        {
            helper.Invoke();
            return helper.Result;
        }
    }

    public override SortKey GetSortKey(string source)
    {
        using (var helper = new CollatorThreadHelper<SortKey>(
                this.collator, (col) => col.GetSortKey(source)))
        {
            helper.Invoke();
            return helper.Result;
        }
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            this.collator?.Dispose();
        }
        base.Dispose(disposing);
    }

    internal class CollatorThreadHelper<TResult> : IDisposable
    {
        private readonly Collator clonedCollator;
        private readonly Thread thread;
        private Exception exception;

        public CollatorThreadHelper(Collator collator, Func<Collator, TResult> action)
        {
            this.clonedCollator = (Collator)collator.Clone();
            this.thread = new Thread(() =>
            {
                try
                {
                    this.Result = action(this.clonedCollator);
                }
                catch (Exception ex)
                {
                    this.exception = ex;
                }
            });
        }
        public TResult Result { get; private set; }

        public void Invoke()
        {
            thread.Start();
            thread.Join();
            if (exception != null) throw exception;
        }

        public void Dispose()
        {
            this.clonedCollator.Dispose();
        }
    }
}

It works great, but I haven't yet benchmarked it to see how it performs without all of the additional cloning. I am also not sure about the property setters, but I only needed them in one class that was already deprecated in the code I am porting.

Describe alternatives you've considered

My second attempt was to try not to call Clone() so much in order to try to make it more efficient. At first I was thinking to try to store the collator in the Thread itself using Thread.SetData, but without a local reference it would leave no way to dispose the collator.

So, I tried making a local cache based on the thread id. Unfortunately, managed and unmanaged thread ids are 2 different things, and the latter requires a lower level call. However, this didn't seem to work (maybe I missed something, though).

internal class ThreadSafeCollator : Collator
{
    private const int cleanupIntervalInSeconds = 5;

    private readonly Collator collator;
    private readonly IDictionary<int, ThreadReference> cache = new Dictionary<int, ThreadReference>();
    private readonly ReaderWriterLockSlim cacheLock = new ReaderWriterLockSlim(LockRecursionPolicy.NoRecursion);
    private readonly CancellationTokenSource cleanReferencesTask;


    public ThreadSafeCollator(Collator collator)
    {
        if (collator == null)
            throw new ArgumentNullException(nameof(collator));

        this.collator = (Collator)collator.Clone();
        this.cleanReferencesTask = new CancellationTokenSource();
        Repeat.Interval(TimeSpan.FromSeconds(cleanupIntervalInSeconds), () => CleanCollatorReferences(), cleanReferencesTask.Token);
    }

    private void CleanCollatorReferences()
    {
        cacheLock.EnterUpgradeableReadLock();
        try
        {
            var deadReferences = cache.Where(r => !r.Value.IsAlive).ToArray();
            if (deadReferences.Any())
            {
                cacheLock.EnterWriteLock();
                try
                {
                    foreach (var deadReference in deadReferences)
                        cache.Remove(deadReference);
                }
                finally
                {
                    cacheLock.ExitWriteLock();
                }
            }
        }
        finally
        {
            cacheLock.ExitUpgradeableReadLock();
        }
    }

    [DllImport("kernel32.dll")]
    static extern int GetCurrentThreadId();

    private Collator GetCurrentCollator()
    {
        cacheLock.EnterUpgradeableReadLock();
        try
        {
            //int id = Thread.CurrentThread.ManagedThreadId;
            int id = GetCurrentThreadId();
            if (cache.ContainsKey(id))
            {
                return cache[id].Collator;
            }
            else
            {
                cacheLock.EnterWriteLock();
                try
                {
                    return (cache[id] = new ThreadReference(
                        collator, Thread.CurrentThread)).Collator;
                }
                finally
                {
                    cacheLock.ExitWriteLock();
                }
            }
        }
        finally
        {
            cacheLock.ExitUpgradeableReadLock();
        }
    }

    public override CollationStrength Strength
    {
        get => GetCurrentCollator().Strength;
        set => GetCurrentCollator().Strength = value;
    }

    public override NormalizationMode NormalizationMode
    {
        get => GetCurrentCollator().NormalizationMode;
        set => GetCurrentCollator().NormalizationMode = value;
    }

    public override FrenchCollation FrenchCollation
    {
        get => GetCurrentCollator().FrenchCollation;
        set => GetCurrentCollator().FrenchCollation = value;
    }

    public override CaseLevel CaseLevel
    {
        get => GetCurrentCollator().CaseLevel;
        set => GetCurrentCollator().CaseLevel = value;
    }

    [Obsolete]
    public override HiraganaQuaternary HiraganaQuaternary
    {
        get => GetCurrentCollator().HiraganaQuaternary;
        set => GetCurrentCollator().HiraganaQuaternary = value;
    }

    public override NumericCollation NumericCollation
    {
        get => GetCurrentCollator().NumericCollation;
        set => GetCurrentCollator().NumericCollation = value;
    }

    public override CaseFirst CaseFirst
    {
        get => GetCurrentCollator().CaseFirst;
        set => GetCurrentCollator().CaseFirst = value;
    }

    public override AlternateHandling AlternateHandling
    {
        get => GetCurrentCollator().AlternateHandling;
        set => GetCurrentCollator().AlternateHandling = value;
    }

    public override object Clone()
    {
        // No need to use our helper here...
        return this.collator.Clone();
    }

    public override int Compare(string source, string target)
    {
        return GetCurrentCollator().Compare(source, target);
    }

    public override SortKey GetSortKey(string source)
    {
        return GetCurrentCollator().GetSortKey(source);
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            cacheLock.EnterWriteLock();
            try
            {
                this.cleanReferencesTask.Cancel();
                this.collator.Dispose();
                foreach (var threadReference in cache.Values)
                {
                    threadReference.Collator?.Dispose();
                }
                this.cleanReferencesTask.Dispose();
                this.cacheLock.Dispose();
            }
            finally
            {
                cacheLock.ExitWriteLock();
            }
        }
        base.Dispose(disposing);
    }

    internal class ThreadReference
    {
        private readonly WeakReference<Thread> thread;
        public ThreadReference(Collator collator, Thread thread)
        {
            this.Collator = (Collator)collator.Clone();
            this.thread = new WeakReference<Thread>(thread);
        }

        public Collator Collator { get; private set; }
        public bool IsAlive => this.thread.TryGetTarget(out Thread target);
    }

    internal static class Repeat
    {
        public static Task Interval(
            TimeSpan pollInterval,
            Action action,
            CancellationToken token)
        {
            // We don't use Observable.Interval:
            // If we block, the values start bunching up behind each other.
            return Task.Factory.StartNew(
                () =>
                {
                    while (true)
                    {
                        if (token.WaitHandle.WaitOne(pollInterval))
                            break;

                        action();
                    }
                }, token, TaskCreationOptions.LongRunning, TaskScheduler.Default);
        }
    }
}

Additional context

First of all, I am wondering if you have any other suggestion that might work.

Secondly, would you consider providing something like this? My thought is to rename the existing collator types, mark them internal, and provide facade classes with exactly the same interfaces and documentation comments as the originals. Thread safe could just be an option that is enabled with a constructor or creation method parameter. The constructor would load either the thread safe or non safe instance internally.

private readonly Collator wrappedCollator;

public RuleBasedCollator(string rules,
	NormalizationMode normalizationMode,
	CollationStrength collationStrength,
        bool threadSafe)
{
    if (threadSafe)
        this.wrappedCollator = ThreadSafeCollator(new RuleBasedCollatorImpl(rules, normalizationMode, collationStrength));
    else
        this.wrappedCollator = new RuleBasedCollatorImpl(rules, normalizationMode, collationStrength);
}

public override SortKey GetSortKey(string source)
{
      return this.wrappedCollator.GetSortKey(source);
}

NightOwl888 avatar Jul 13 '19 01:07 NightOwl888

I assume you're familiar with http://userguide.icu-project.org/design. That mentions a Freezable interface on Collator - unfortunately it seems this is only implemented in icu4j.

So - no, I don't have any other ideas.

Your suggestion of creating facade classes sounds good. I'm happy to accept a PR. Can you please share the results you get from running benchmarks?

With the facade classes I guess the question with the property setters remains the same. Maybe we should implement something like Freezable? Then we could throw an exception in the setter if it's frozen. I haven't thought this through - just an idea that came to my mind.

ermshiperete avatar Jul 18 '19 09:07 ermshiperete

See also https://unicode-org.atlassian.net/browse/ICU-20719

ermshiperete avatar Sep 23 '19 09:09 ermshiperete

Thanks. I ended up porting the portions of ICU4J that are required by Lucene.NET: https://github.com/NightOwl888/ICU4N

Unfortunately, this icu-dotnet has too many limitations for our needs, not just thread safety. For example, we need to use UnicodeSet as a filter. I attempted to implement the functionality, but it doesn't work.

That said, there are some unfinished issues to deal with on ICU4N before it can be called stable, most notably:

  1. A way to override data to provide custom data injection
  2. An subclass of CultureInfo that provides similar functionality to ULocale (and allows a platform for building culture sensitive APIs)
  3. APIs to match naming conventions/parameters of .NET types, i.e. Char > UChar

What are your thoughts on combining our efforts on ICU4N?

NightOwl888 avatar Sep 23 '19 17:09 NightOwl888