NuGetPackageExplorer icon indicating copy to clipboard operation
NuGetPackageExplorer copied to clipboard

Performance is low when reading a large JSON feed

Open 304NotModified opened this issue 6 years ago • 12 comments
trafficstars

When reading a large JSON feed

Steps:

image

  1. load from https://dotnetfeed.blob.core.windows.net/dotnet-core/index.json

  2. Loaded:

image

Current result

Load takes ~16 seconds

Details

The expensive calls are:

var json = await _rawPackageSearchResouce.Search(searchText, _searchContext.Filter, CurrentPage * _pageSize, _pageSize, NullLogger.Instance, token);
json.Select(s => s.FromJToken<PackageSearchMetadata>()).ToList();

inside ShowLatestVersionQueryContext<T>

Notes:

  • rawPackageSearchResouce.Search is parsing to JObject
  • FromJToken is just a JToken.ToObject<T>(JsonSerializer jsonSerializer) from JSON.NET
  • The FromJToken is called for each package separate, maybe deserialising the whole content in one step is more efficient.
  • full URL of the JSON feed: https://dotnetfeed.blob.core.windows.net/dotnet-core/search/query?q=&skip=0&take=15&prerelease=false&semVerLevel=2.0.0 (141 MB, raw, without gzip etc)

304NotModified avatar Aug 11 '19 11:08 304NotModified

Just want to note that the current logic is similar to the one in Nuget.Client but without the total count per version "fix": https://github.com/NuGet/NuGet.Client/blob/e9c22b1c5783edefcd9c5175dc76f99206fa14c8/src/NuGet.Core/NuGet.Protocol/Resources/PackageSearchResourceV3.cs#L28-L32

campersau avatar Aug 11 '19 12:08 campersau

Thanks!

I think both aren't build for (large) static feeds, isn't?

304NotModified avatar Aug 11 '19 12:08 304NotModified

This is maybe a poor mans test, but looks good in terms of performance:

image

The CurrentApproachTest is a stripped version of the code from NuGet Client

  public class PerformanceJsonTests
    {
        private string _resourceName = "UnitTestProject1.dotnetfeed.blob.core.windows.net.json";

        [Fact]
        public async void CurrentApproachTest()
        {
            var token = new CancellationToken();
            using var stream = GetEmbeddedSource(_resourceName);

            // Act
            var results = await stream.AsJObjectAsync(token);
            var data = results[JsonProperties.Data] as JArray ?? Enumerable.Empty<JToken>();
            var json = data.OfType<JObject>();
            var packages = json.Select(s => s.FromJToken<PackageSearchMetadata>()).ToList();

            // Assert
            AssertPackages(packages);
        }

        [Fact]
        public void JsonNetTest()
        {
            using var stream = GetEmbeddedSource(_resourceName);

            // Act
            var packages = DeserializeFromStream<FullPackageSearchMetadata>(stream);

            // Assert
            AssertPackages(packages.Data);
        }

        private class FullPackageSearchMetadata
        {
            public List<PackageSearchMetadata> Data { get; set; }
        }

        private static T DeserializeFromStream<T>(Stream s)
        {
            using (StreamReader reader = new StreamReader(s))
            using (JsonTextReader jsonReader = new JsonTextReader(reader))
            {
                JsonSerializer ser = JsonExtensions.JsonObjectSerializer;
                return ser.Deserialize<T>(jsonReader);
            }
        }

        private static void AssertPackages(List<PackageSearchMetadata> packages)
        {
            Assert.Equal(1904, packages.Count);
            var package = packages.First();
            Assert.Equal("3.0.0-alpha-26807-18", package.ParsedVersions.First().Version.OriginalVersion);
            Assert.Equal("Accessibility", package.Identity.Id);
        }

        private static Stream GetEmbeddedSource(string resoucename)
        {
            var assembly = Assembly.GetExecutingAssembly();

            var stream = assembly.GetManifestResourceStream(resoucename);
            if (stream == null)
            {
                throw new Exception($"resource {resoucename} not found");
            }
            return stream;
        }
    }

full test code here: https://github.com/304NotModified/NuGetPackageExplorer/tree/static-feed-json-parse-performance/UnitTestProject1

304NotModified avatar Aug 12 '19 22:08 304NotModified

results from benchmarkdotnet:


BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.885 (1803/April2018Update/Redstone4)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
Frequency=2156252 Hz, Resolution=463.7677 ns, Timer=TSC
.NET Core SDK=3.0.100-preview7-012821
  [Host]     : .NET Core 3.0.0-preview7-27912-14 (CoreCLR 4.700.19.32702, CoreFX 4.700.19.36209), 64bit RyuJIT
  Job-JKYBCG : .NET Core 3.0.0-preview7-27912-14 (CoreCLR 4.700.19.32702, CoreFX 4.700.19.36209), 64bit RyuJIT
  Core       : .NET Core 3.0.0-preview7-27912-14 (CoreCLR 4.700.19.32702, CoreFX 4.700.19.36209), 64bit RyuJIT

Runtime=Core  InvocationCount=1  UnrollFactor=1  

Method Job IterationCount LaunchCount RunStrategy WarmupCount Feed Mean Error StdDev Rank
NewApproach Default 5 1 Monitoring 1 Dotnetfeed 3,024.743 ms 40.0995 ms 10.4137 ms 3
CurrentApproach Default 5 1 Monitoring 1 Dotnetfeed 7,153.443 ms 430.3471 ms 111.7598 ms 4
NewApproach Core Default Default Default Default Dotnetfeed 3,031.691 ms 61.7188 ms 131.5278 ms 3
CurrentApproach Core Default Default Default Default Dotnetfeed 7,498.749 ms 149.6982 ms 189.3203 ms 5
NewApproach Default 5 1 Monitoring 1 Nuget 3.136 ms 0.7524 ms 0.1954 ms 1
CurrentApproach Default 5 1 Monitoring 1 Nuget 4.823 ms 0.3609 ms 0.0937 ms 2
NewApproach Core Default Default Default Default Nuget 2.994 ms 0.0582 ms 0.0544 ms 1
CurrentApproach Core Default Default Default Default Nuget 4.858 ms 0.0946 ms 0.1263 ms 2

results as image

304NotModified avatar Aug 13 '19 22:08 304NotModified

Benchmark with memory usage:

image

304NotModified avatar Aug 13 '19 22:08 304NotModified

Is the NewApproach this one https://github.com/304NotModified/NuGetPackageExplorer/blob/a6e936f827433dd41e9546cba74a06bcc4719a68/UnitTestProject1/PerformanceJsonTests.cs#L21-L31 ?

How would we integrate the NewApproach because we currently using the RawSearchResourceV3 which returns an IEnumerable<JObject> and not the raw stream?

campersau avatar Aug 14 '19 05:08 campersau

Yes newApproach == JsonNetTest()

About the integration, or send a PR to nuget (client?) or fork the relevant classes.

304NotModified avatar Aug 14 '19 13:08 304NotModified

https://github.com/NuGet/NuGet.Client/pull/3406 got merged which should improve memory usage and performance for static feeds.

campersau avatar Jun 01 '20 18:06 campersau

I looked into the fix in more detail and it will limit static feeds to only return the first items - specified by take. I think it is because static feeds can't be distinguished from dynamic server feeds. https://github.com/NuGet/NuGet.Client/blob/427adf89c1fa3aab03f4f3840982f2d6b030d3e3/src/NuGet.Core/NuGet.Protocol/Resources/PackageSearchResourceV3.cs#L235-L243

So in our case we would only show the first 15 results (in a few seconds) and when scrolling down it would load the same 15 results again - indefinitely. As soon as we remove our RawSearchResourceV3 workaround.

campersau avatar Jun 13 '20 18:06 campersau

@campersau can you please file a bug on the NuGet repo describing this limitation? Seems like there's still a gap that needs to be fixed.

clairernovotny avatar Jun 13 '20 19:06 clairernovotny

It looks like static feeds will not be supported by NuGet.Client (see https://github.com/NuGet/Home/issues/9726#issuecomment-654581621). So I think we have two options now:

  • Try to support static feeds on our own (as long as RawSearchResourceV3 is still there)
    • Copying and adjusting some code from https://github.com/NuGet/NuGet.Client/pull/3406
  • Drop support for static feeds

campersau avatar Jul 07 '20 09:07 campersau

Isn't static feeds one of the main benefits of NPE over nuget.org/another package website/Azure Devops?

304NotModified avatar Jul 07 '20 12:07 304NotModified