NetOffice icon indicating copy to clipboard operation
NetOffice copied to clipboard

Microsoft Interop vs NetOffice Performance

Open matteshe opened this issue 7 years ago • 11 comments

Hi,

I switch in my application from Microsoft Interop late binding usage to NetOffice. Unfortunatelly we are facing massive performance slowdown in excel generation and reading now.

I wrote a simple programm (ExcelApiPerformanceTests) to compare both ways of excel access. Furthermore I used 3 differnt ways of accessing the cells in order to find a fast solution. I attach my project, so that you can see the slow performance of netoffice. May be there is a faster compareable solution which I don't know. At least this way at the moment I need to go back to the Interop calls. :-/

I experience this with office 2010 and 2016.

matteshe avatar Jun 04 '18 10:06 matteshe

Hi, thanks for the sample application.

Do you have issues with any particular APIs, or the use of NetOffice in general?

From the numbers output in the sample application I cannot tell what's going on.

jozefizso avatar Jun 04 '18 10:06 jozefizso

I wrote a simple programm (ExcelApiPerformanceTests) to compare both ways of excel access.

Range class contains functionality to work with the array and give the possibility to avoid for interaction. And if you use Range as the object to do an action with the array, you raise performers of the code.

Using for interaction is always very slow but it is not needed.

maliutin avatar Jun 04 '18 10:06 maliutin

@jozefizso i use only excel API at the moment. The output shows only seconds of each thread, how long it takes to read/write the 100x100 cells.

@maliutin do you have a simple example how to avoid the for loop when using the range object?

matteshe avatar Jun 04 '18 11:06 matteshe

For example:

For colIdx As Integer = 1 To cols
    For rowIdx As Integer = 1 To rows
        newInputSheet.Cells(colIdx, rowIdx).Value = inputSheet.Cells(colIdx, rowIdx).Value
        newInputSheet.Cells(colIdx, rowIdx).FormulaLocal = inputSheet.Cells(colIdx, rowIdx).FormulaLocal
    Next
Next

I suppose, in this case, we can use Cope method:

Using inputCells = inputSheet.Cells,
		inputLeftTopCell = inputCells(1, 1),
		inputRightBottomCell = inputCells(rows, cols),
		inputRange = inputSheet.Range(inputLeftTopCell, inputRightBottomCell)

	Using newCells = newInputSheet.Cells,
		  newRange = newCells(1, 1)

		inputRange.Copy(newRange)
	End Using
End Using

maliutin avatar Jun 04 '18 13:06 maliutin

I see, this works for the simplicity of the program. In my main app I generate the excel first based from an xml file and after changing values in excel I transform this back to xml. I need to check each cell on value and formular.

At least I got an idea how to work with array access

Dim valueArray(,) As Object = usedRange.Value2()
Dim formularArray(,) As Object = usedRange.FormulaLocal()

newInputSheet.Range("A1:Z50").Value2 = valueArray
newInputSheet.Range("A1:Z50").FormulaLocal = formularArray

May be this could be a way to improve performance on my main app.

Nevertheless it doesn't explain why the NetOffice way is so much slower as the MS interop way.

matteshe avatar Jun 04 '18 14:06 matteshe

Dim valueArray(,) As Object = usedRange.Value2()

In this case, you should allocate an additional part of the memory.

It is often the situation when a user has Excel x86, several opened files in the same workspace and you need to process 50000x50000 cells. In result, you have a big risk to catch OutOfMemoryException in that line of the code.

And I repeat the main idea: Range it is special class created by Microsoft to work with the data array.

maliutin avatar Jun 04 '18 14:06 maliutin

While I agree that the suggestion to use Range and array based manipulation of multiple cell values is the way to go, I'd like to understand why OP is getting those results.

NetOffice is supposed to be similar in performance to VB Late-binding according to this: https://netoffice.io/documentation/technical/performance.html

@matteshe have you tried running it through the performance profiler and seeing where there delays some from?

ErrCode avatar Aug 31 '18 00:08 ErrCode

Made these changes to Module1::Main:

  • execute the LateBindingWay and NetOfficeWay methods sequentially to remove any bias from poor thread scheduling/execution (just in case - e.g. if there are less available CPU cores than there are threads).
  • used the StopWatch class for measuring how long a block of code takes (instead of using the PrintSeconds)
  • nothing to do with performance, but ensured the cleanup of NetOfficeWay is consistent with LateBindingWay by calling Quit on Excel and not just Dispose (as doing just dispose seemed to leave Excel processes running in the background).

These were the results I got:

colums:104, rows:101
LateBindingWay 1/3 start...
with inputsheet
LateBindingWay 1/3 ended : took (seconds)72.8047746
with used range
LateBindingWay 2/3 start...
LateBindingWay 2/3 ended : took (seconds)48.6899634
with cell once
LateBindingWay 3/3 start...
LateBindingWay 3/3 ended : took (seconds)33.9049766
done
colums:104, rows:101
NetOfficeWay 1/3 start...
with inputsheet
NetOfficeWay 1/3 ended : took (seconds)98.1585496
with used range
NetOfficeWay 2/3 start...
NetOfficeWay 2/3 ended : took (seconds)117.9447456
with cell once
NetOfficeWay 3/3 start...
NetOfficeWay 3/3 ended : took (seconds)76.4256934
done
press any key to close

ErrCode avatar Aug 31 '18 02:08 ErrCode

@ErrCode You did not implement one of the main points of NetOffice. You did not implement using, especially for range objects. See my example.

maliutin avatar Aug 31 '18 15:08 maliutin

@maliutin

Understood. I'm merely curious.

Looking at OP's code, both LateBindingWay and NetOfficeWay methods are doing equivalent calls to Excel. This seems to me a legitimate comparison of the overheads for calling into Excel, despite the fact that it is not using the most efficient way to achieve the overall task in Excel.

Since Excel should be doing exactly the same thing given the calls, then any timing difference could be attributed as the calling overhead from using the respective methods.

Simply copying cell ranges from one sheet to another is probably not very realistic anyway (the technique of using Copy range is actually not unique to NetOffice and could be equivalently achieved in late-binding too). More realistic would have been to manipulate data in "random" places (e.g. you have to build a visual flight seating layout, showing the passenger name per seat and color code the background for each passenger that has special status/meal requests/requirements, etc). Obviously this isn't what @matteshe did here, but I do believe manipulating individual cells in loops are a valid use case.

I downloaded the code to NetOffice's original PerformanceTests and can see the original author of NetOffice also took the approach of manipulating individual cells when comparing performance between different ways to call Excel.

When I get some time, I'd like swap out the old NetOffice references for the PerformanceTests and replace them with the latest version of NetOffice and report back.

ErrCode avatar Sep 01 '18 02:09 ErrCode

For future use and readers:

NetOffice original performance tests has been made in ~2011. (as far i remember in .Net 2 and C#3) Today it looks like Microsoft has been optimized the dynamics in its way to access COM. I retry my original NetOffice tests few month before with an actual C# compiler and the dynamics works as fast as VB/Interop/NetOffice now.

SebastianDotNet avatar Mar 05 '19 01:03 SebastianDotNet