speckle-sharp icon indicating copy to clipboard operation
speckle-sharp copied to clipboard

Operations.Send() is radically slow

Open nickger opened this issue 1 year ago • 4 comments

What package are you referring to?

Operations.

Describe the bug

Operations.Send() demonstrates a very slow performance in general and particularly for SQLiteTransport. I've decided to report it as a bug because it is not a small issue: real projects require too much time to export. Currently the time to write locally 7500 elements is about 10 min. I've not checked server transport, but the issue is not in transport code (see below).

To Reproduce

A high level code to send the data collected in a queue from Revit (Task.Run is omitted):

bool started = true;
using (var transp = new CustomIFCExport.SQLiteTransport(@"C:\Users\GNI\Documents\GeomExports", null, doc.Title))
{
   while (started || !dataQueue.IsEmpty)
    {
        if (dataQueue.TryDequeue(out Base data))
        {
           await Operations.Send(data, transp, false);  // Call the method that sends data
                                            
        }
        else
        {
            // If no data is available, check if collection has finished
            if (await Task.WhenAny(collectionCompleted.Task, Task.Delay(100)) == collectionCompleted.Task && dataQueue.IsEmpty)
            {
                // Exit loop if collection is complete and queue is empty
                started = false;
                
            }
        }
        
    }
                           
}

Additional context

We use local transports to speed up the development and enable local collaboration instead of using Speckle Server for experiments. I am entirely sure this is also important for other users and developers.

Proposed Solution (if any)

The code wich explicitelly repeats the same steps but with modified awaiter for transport write finish:

bool started = true;
using (var transp = new CustomIFCExport.SQLiteTransport(@"C:\Users\GNI\Documents\GeomExports", null, doc.Title))
{
    BaseObjectSerializerV2 serializerV2 = new BaseObjectSerializerV2( new List<ITransport>() { transp }, null, default);
    transp.BeginWrite();
    while (started || !dataQueue.IsEmpty)
    {
        if (dataQueue.TryDequeue(out Base data))
        {
            serializerV2.Serialize(data);  // Call the method that sends data
                                            
        }
        else
        {
            // If no data is available, check if collection has finished
            if (await Task.WhenAny(collectionCompleted.Task, Task.Delay(100)) == collectionCompleted.Task && dataQueue.IsEmpty)
            {
                // Exit loop if collection is complete and queue is empty
                started = false;
                transp.EndWrite();
            }
        }
        
    }
    await transp.WriteComplete();  //this resolves the issue with degraded performance                      
}

The point is: Operations.SerializerSend() creates tasks to wait for EVERY object write operation. This is an overkill: we need to wait the transport globally, not for each object. Moving this method outside to the level, where the transport is instantiated resolves the issue. This must also be documented - that waiting for the writer is not a responsibility of the sender.

nickger avatar Oct 08 '24 09:10 nickger

Hi @nickger!

Thanks for bringing this up.

The Operations.Send function wasn't really designed to be called multiple times in sequence sending small objects, but rather to send one big fat object to the transports of choice. Both Operations.Send and Operations.Receive where designed as self-contained utility functions, hence why they wait for the writer to finish: It is not expected you'd recycle the write operation of the transport for successive sends.

That being said... 👇🏼

Does the performance issue still exist if you wrap your data objects inside a new base object and just send that? This would reduce the Operations.Send calls to just 1 instead of having a loop. This would require you to wait until the collection has completed (as per your loop suggests)

i.e.

var rootObject = new Base();
rootObject["@data"] = dataQueue.ToList()

await Operations.Send(data, transp, false);

If your use-case has some limitations that require you to use a queue instead for performance reasons, then your proposed code seems like it should do the trick just fine.

Out of curiosity, how many objects would you expect to come out of the queue?

AlanRynne avatar Oct 08 '24 11:10 AlanRynne

Hi, Alan. We are discovering a scenario to export IFC from Revit through an intermediate storage as Revit itself is horrable at this. As such the amount of objects can easily reach 100000. Doing collection, serialization and writing sequentially is not very smart and takes about 20-30Gb for nothing. We couldn't even imagine that it is a defaul scenario. Pipe processing is prefferable to extract the data. Converting to IFC or whatever is the next step. Anyway, would be great to see such important details in the documentation. For now it looks false too simple.

nickger avatar Oct 08 '24 11:10 nickger

We're working on improving performance left and right throughout the codebase, so thanks for the feedback. We'll update the docs to reflect your comment, as it's true it can be misleading.

didimitrie avatar Oct 08 '24 12:10 didimitrie

Our v3 sdk (speckle-sharp-sdk), we've made many improvements to the send performance. However, the use case for making thousands of small send operations is not one we're interested in optimising for at the moment.

If this is critical, please open a feature request ticket in the v3 repo. We aren't planning on bringing any performance improvements to the v2 sdk in this repo.

JR-Morgan avatar Nov 28 '24 10:11 JR-Morgan