ZLinq icon indicating copy to clipboard operation
ZLinq copied to clipboard

ZLinq is much slower that normal Linq (string-join, count)

Open alex-jitbit opened this issue 9 months ago • 5 comments

Method Mean Error StdDev Gen0 Allocated
StringJoinLinq 60.74 ns 12.062 ns 0.661 ns 0.0267 224 B
StringJoinZLinq 75.53 ns 7.338 ns 0.402 ns 0.0086 72 B
CountLinq 17.62 ns 1.492 ns 0.082 ns 0.0086 72 B
CountZLinq 32.65 ns 0.645 ns 0.035 ns - -

Maybe that's because I'm on .NET8, but ZLinq is much slower. 1.5x to 2x slower. Am I doing something wrong maybe? This is the benchmark I wrote

[ShortRunJob, MemoryDiagnoser]
public class Benchmark
{
	private static readonly List<User> _users = new List<User>() {
		new User() { Name = "John", Email = "[email protected]", IsActive = true },
		new User() { Name = "Jane", Email = "[email protected]", IsActive = false },
		new User() { Name = "Jim", Email = "[email protected]", IsActive = true },
		new User() { Name = "Jill", Email = "[email protected]", IsActive = false },
		new User() { Name = "Jack", Email = "[email protected]", IsActive = true },
		new User() { Name = "Jill", Email = "[email protected]", IsActive = false },
		new User() { Name = "Jack", Email = "[email protected]", IsActive = true },
		new User() { Name = "Jill", Email = "[email protected]", IsActive = false },
		new User() { Name = "Jack", Email = "[email protected]", IsActive = true },
	};


	[Benchmark]
	public string StringJoinLinq() => string.Join(",", _users.Where(x => x.IsActive).Select(x => x.Name));

	[Benchmark]
	public string StringJoinZLinq() => _users.AsValueEnumerable().Where(x => x.IsActive).Select(x => x.Name).JoinToString(',');

	[Benchmark]
	public int CountLinq() => _users.Where(x => x.IsActive).Count();

	[Benchmark]
	public int CountZLinq() => _users.AsValueEnumerable().Where(x => x.IsActive).Count();
}

public class User
{
	public string Name { get; set; }
	public string Email { get; set; }
	public bool IsActive { get; set; }
}

I only tested string-join and count, but I basically stopped at this point scratching my head if I should try further.

alex-jitbit avatar May 20 '25 07:05 alex-jitbit

ValueEnumerable<T> is a struct, and its size increases slightly with each method chain. With many chained methods, copy costs can become significant. When iterating over small collections, these copy costs can outweigh the benefits, causing performance to be worse than standard LINQ. However, this is only an issue with extremely long method chains and small iteration counts, so it's rarely a practical concern.

Maybe because your dataset is small? Try with 1000 and 10000

sn4k3 avatar May 20 '25 15:05 sn4k3

Maybe a bit crude, using .NET 4.8.1 ZLinq is always much slower (tried a lot of different tests before this one, just an example)

        private static readonly Random Random = new();
        public static String RandomString(Int32 length)
        {
            const String chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
            return new String(Enumerable.Repeat(chars, length).Select(s => s[Random.Next(s.Length)]).ToArray());
        }
        public static String RandomStringZLinq(Int32 length)
        {
            const String chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
            return new String(Enumerable.Repeat(chars, length).AsValueEnumerable().Select(s => s[Random.Next(s.Length)]).ToArray());
        }
            Stopwatch stopwatch = new();
            Dictionary<String, Int64> elapsed = [];

            stopwatch.Start();

            for (Int32 i = 0; i < 100000; i++)
            {
                elapsed.Add(RandomString(50), i);
            }

            stopwatch.Stop();
            Console.WriteLine(@"Linq Random: " + stopwatch.ElapsedMilliseconds);
            elapsed = [];
            stopwatch.Restart();

            for (Int32 i = 0; i < 100000; i++)
            {
                elapsed.Add(RandomStringZLinq(50), i);
            }

            stopwatch.Stop();
            Console.WriteLine(@"ZLinq Random: " + stopwatch.ElapsedMilliseconds);
            stopwatch.Restart();

            for (Int32 i = 0; i < 10000000; i++)
            {
                var result = elapsed.OrderBy(x => x.Value);
            }

            stopwatch.Stop();
            Console.WriteLine(@"Linq order: " + stopwatch.ElapsedMilliseconds);
            stopwatch.Restart();

            for (Int32 i = 0; i < 10000000; i++)
            {
                var result = elapsed.AsValueEnumerable().OrderBy(x => x.Value);
            }

            stopwatch.Stop();
            Console.WriteLine(@"ZLinq order: " + stopwatch.ElapsedMilliseconds);

result: Linq Random: 256 ZLinq Random: 311 Linq order: 161 ZLinq order: 1574

FYN-Michiel avatar May 20 '25 15:05 FYN-Michiel

@alex-jitbit

String.Join was optimized for IEnumerable<string> and string separator, but JoinToString was lacking these optimizations (it only handled the ValueEnumerable<T> + ReadOnlySpan<char> case), which created a performance gap.

Regarding Count, Count(predicate) is inherently faster than Where(predicate).Count(). However, it's possible to convert Where(predicate).Count() to the equivalent of Count(predicate), so I implemented such processing.

Both optimizations show improvements in my local benchmarks. I'd like to consider if further optimizations are possible before releasing.

Thank you very much for providing valuable hints for additional optimizations.

Image

Image

neuecc avatar May 21 '25 08:05 neuecc

@FYN-Michiel Please use BenchmarkDotNet for benchmarking. Regarding OrderBy, I think the measurement is incorrect because it's not materialized. RandomString has complex components (conversion from IEnumerable, Select, ToArray). However, when dealing with IEnumerable<T> itself, overhead can occur, and ToArray's implementation for .NET Framework differs slightly from others, so that influence should be considered.

neuecc avatar May 21 '25 08:05 neuecc

This average (Mean) is calculated based on the dataset, and when you compare it to the other fields, the differences become quite notable.

Just see the Error | StdDev | Gen0 | Allocated where less is more.

I work with large datasets and i see the difference.

alphanumericaracters avatar May 21 '25 19:05 alphanumericaracters

I've released v1.4.7 includes Where(pred).Count and JoinToString improvement.

neuecc avatar May 22 '25 10:05 neuecc