pyyeti icon indicating copy to clipboard operation
pyyeti copied to clipboard

Speed up reading op4

Open xper0418 opened this issue 3 years ago • 8 comments

First of all, Pyyeti is very useful. But I'd like to speed up when reading op4 files.

I have 250 mb size op4, which takes about 20 seconds for reading by pyyeti. for 600 mb, 50 seconds.

xper0418 avatar Nov 23 '21 05:11 xper0418

@xper0418, that is something I'd like to see as well! And it's something I think about every so often. I'm just not sure how to tackle this problem. One solution would be to rewrite these routines in C, but that doesn't sound fun for a number of reasons. Another idea I've had is to parallelize the reading. That sounds more fun, but I'm uncertain how practical/general that solution would be. My solution so far for big op4 files is to read once and then use flammkuchen (https://pypi.org/project/flammkuchen/) to save/load after that ... not elegant, but it works.

Thanks for your input! It gives me the impetus to look into this again, with more urgency. Before doing anything major, I'll run a profiler to see where the biggest bottlenecks are.

Do you have any other ideas on how to speed it up?

twmacro avatar Dec 05 '21 23:12 twmacro

Thank you for your response. I've seen an article that when 'pyNastran' reads op2, the speed comes out up to 500 mb/s(when use ssd). I think it is worth to look into it. And I found numpy is much more faster than unpack. Refer to the link below. https://stackoverflow.com/questions/54679949/unpacking-binary-file-using-struct-unpack-vs-np-frombuffer-vs-np-ndarray-vs-np-f

xper0418 avatar Dec 06 '21 12:12 xper0418

@twmacro Hi. Any updates on this?

xper0418 avatar Jun 06 '22 07:06 xper0418

Hi @xper0418, sorry for the delay in responding. I have spent some time on this, mainly profiling to see the bottlenecks, but I also experimented some with different ideas. I tried numba, numexpr, threading, using different buffer sizes and other stuff I'm sure. Unfortunately, I don't have any quick fixes for this. Output4 files are not the most efficient format to read/write. My conclusion so far is that these routines might have to be written in C to get significantly better performance (which I can't see myself doing anytime soon). I still have a couple experiments I plan to try, but I don't have high hopes.

From the profiling, I concluded that these two lines inside the loop take the lion's share of the time:

Y = np.fromfile(fp, numform2, nwords)
X[r : r + len(Y), c] = Y

Is there a simple speed up for that code I wonder?? I think it would speed things up a bit if np.fromfile could store the values directly to X, but I don't know if that's easily doable.

twmacro avatar Jul 02 '22 19:07 twmacro

Hey there @twmacro, I read the above and I'm wondering - I don't think the np.fromfile() call can be sped up without a drastic re-write, but regarding the setting of data in X with Y values... Is that taking a significant portion of the total run time? What order is X, "C" or "F"? I wonder if the indexing itself taking a while could be improved any by changing the order of X. Worth a try?

Or, I wonder if storing a big list of 1D vectors then stacking at the very end could possibly be faster than repeatedly indexing the 2D array. Unlikely, but just a thought.

jeremypriest avatar Sep 05 '22 03:09 jeremypriest

Hello @jeremypriest! Excellent thoughts! It's been quite a long time, but I recall switching from "C" to "F" order on these matrices to enhance speed. It makes sense to me that "F" would be faster (since that matches the order in the .op4 file), but I haven't experimented recently with this. I also like your other idea of using 1D vectors. I'll add these ideas to the "to-do" list! :)

twmacro avatar Sep 12 '22 14:09 twmacro

Hey Tim - I was wondering if the pyNastran package reads op4 files any faster than pyyeti. I use pyyeti only, but I notice that pyNastran has (limited, unlike pyyeti) support for op4 reading. But if it does read certain files faster, it's open source and may provide some insight into how they do it.

jeremypriest avatar Oct 20 '22 12:10 jeremypriest

Thinking on my above comment, I did some benchmarking and found that pyYeti is, in my opinion, pretty fast in loading large matrices from op4 files (tested using a 1.1 GB binary file).

It looks like assembling the matrices as sparse generally causes a large performance hit, but otherwise, I think pyYeti is competitive in op4 load speeds with pyNastran.

Outside of porting the Python code used to read the op4 data over to something more performant and low-level, I don't think there's too much to gain here.

I would support closure of this issue, or renaming of this issue to something like "TODO: port op4 reading to XYZ language" if @twmacro prefers that.

Here is the data I collected:

Module Sparse setting Speed (sec)
pyYeti sparse=False 2
pyYeti sparse=None 58
pyNastran Not change-able, but similar to "sparse=None" in pyYeti 55

jeremypriest avatar Oct 20 '22 19:10 jeremypriest

Thank you so much for your input @jeremypriest! I'll close this issue now since I don't have any good way that I know of to speed this up. I will however keep an open mind for any good ideas and perhaps experiment from time to time.

twmacro avatar Nov 11 '22 01:11 twmacro