Speed
Here is a profile of essentially running DFlowData.clean_data() + all the methods in WalkingData (including inverse dynamics compuations).
7641430 function calls (7640347 primitive calls) in 181.271 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
320218 165.994 0.001 165.994 0.001 {method 'readline' of 'file' objects}
3 2.709 0.903 2.709 0.903 {method 'read' of 'pandas.parser.TextReader' objects}
36844 1.017 0.000 1.017 0.000 {pandas.lib.maybe_convert_objects}
19 0.879 0.046 168.193 8.852 session.py:494(evaluate)
36839 0.816 0.000 4.204 0.000 series.py:3129(interpolate)
37612 0.574 0.000 0.643 0.000 common.py:134(_isnull_ndarraylike)
120 0.530 0.004 0.532 0.004 fitpack2.py:441(__init__)
41185 0.385 0.000 0.385 0.000 {method 'reduce' of 'numpy.ufunc' objects}
320229 0.331 0.000 0.331 0.000 {_codecs.utf_8_decode}
320226 0.284 0.000 0.777 0.000 {method 'decode' of 'str' objects}
268534 0.267 0.000 0.485 0.000 {method 'view' of 'numpy.ndarray' objects}
37998 0.258 0.000 0.851 0.000 series.py:430(__new__)
37612 0.238 0.000 0.988 0.000 common.py:64(_isnull_new)
951420/951418 0.226 0.000 0.226 0.000 {isinstance}
38160 0.208 0.000 0.209 0.000 {method 'copy' of 'numpy.ndarray' objects}
111799 0.208 0.000 0.426 0.000 index.py:332(__getitem__)
36642 0.203 0.000 1.341 0.000 frame.py:1928(_ixs)
320110 0.185 0.000 0.297 0.000 __init__.py:1343(isEnabledFor)
98213 0.169 0.000 0.169 0.000 {numpy.core.multiarray.array}
320223 0.162 0.000 0.493 0.000 utf_8.py:15(decode)
320110 0.154 0.000 0.451 0.000 __init__.py:1128(debug)
151206 0.133 0.000 0.193 0.000 numeric.py:1810(isscalar)
38175 0.132 0.000 0.298 0.000 series.py:3299(_sanitize_array)
801 0.130 0.000 0.130 0.000 {pandas.algos.take_2d_axis0_float64_float64}
164141/164139 0.127 0.000 0.129 0.000 {getattr}
1 0.119 0.119 0.277 0.277 __init__.py:20(<module>)
1246 0.117 0.000 0.117 0.000 {numpy.core.multiarray.concatenate}
36839 0.112 0.000 4.317 0.000 frame.py:4389(<lambda>)
320110 0.112 0.000 0.112 0.000 __init__.py:1329(getEffectiveLevel)
197 0.110 0.001 6.391 0.032 frame.py:4433(_apply_standard)
36642 0.098 0.000 0.647 0.000 internals.py:1681(iget)
76780 0.095 0.000 0.108 0.000 index.py:318(__contains__)
75404 0.092 0.000 0.213 0.000 series.py:532(__array_finalize__)
I think using readline to pull the events from the record files is a major slow down. Also all the calls to session.py are oct2py calls, which are slow. I believe the slow parts are the inverse dynamics real time filter and the soder.m file.
Calling .readlines() a file is fast, as done for the event finding. I'm not sure where all of the .readline() methods are being called.
In [7]: %time df = pandas.read_csv('/home/moorepants/Data/human-gait/gait-control-identification/T006/mocap-006.txt', delimiter='\t')
CPU times: user 1.49 s, sys: 60.8 ms, total: 1.55 s
Wall time: 1.55 s
Reading in the files is fast. And profiling doesn't show any readline() calls.
It could be this readline() call in oct2py/session.py:evaluate (line 551):
line = self.proc.stdout.readline().rstrip().decode('utf-8')
Here's profiling output of essentially just the DFlowData.clean_data() call:
1317864 function calls (1317260 primitive calls) in 62.622 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
128105 56.725 0.000 56.725 0.000 {method 'readline' of 'file' objects}
3 2.573 0.858 2.573 0.858 {method 'read' of 'pandas.parser.TextReader' objects}
120 0.560 0.005 0.562 0.005 fitpack2.py:441(__init__)
10 0.336 0.034 57.587 5.759 session.py:494(evaluate)
379 0.210 0.001 0.212 0.001 common.py:134(_isnull_ndarraylike)
128153 0.122 0.000 0.122 0.000 {_codecs.utf_8_decode}
128152 0.104 0.000 0.286 0.000 {method 'decode' of 'str' objects}
2 0.102 0.051 0.102 0.051 {scipy.signal.sigtools._linear_filter}
235 0.096 0.000 0.096 0.000 {method 'copy' of 'numpy.ndarray' objects}
835 0.091 0.000 0.091 0.000 {method 'reduce' of 'numpy.ufunc' objects}
182 0.088 0.000 0.088 0.000 {numpy.core.multiarray.putmask}
5 0.077 0.015 0.077 0.015 {pandas.algos.take_2d_axis1_float64_float64}
670 0.076 0.000 0.088 0.000 index.py:332(__getitem__)
272 0.072 0.000 0.168 0.001 series.py:709(_get_values)
120 0.070 0.001 0.070 0.001 {scipy.interpolate._fitpack._spl_}
128037 0.068 0.000 0.111 0.000 __init__.py:1343(isEnabledFor)
6 0.065 0.011 0.065 0.011 {pandas.algos.take_2d_axis0_float64_float64}
128150 0.061 0.000 0.183 0.000 utf_8.py:15(decode)
128037 0.057 0.000 0.168 0.000 __init__.py:1128(debug)
1 0.053 0.053 0.133 0.133 __init__.py:20(<module>)
2 0.048 0.024 0.054 0.027 indexing.py:100(_setitem_with_indexer)
13 0.044 0.003 0.044 0.003 {numpy.core.multiarray.concatenate}
128037 0.043 0.000 0.043 0.000 __init__.py:1329(getEffectiveLevel)
6 0.041 0.007 0.042 0.007 internals.py:2368(_stack_arrays)
90 0.039 0.000 0.072 0.001 series.py:719(where)
4 0.031 0.008 0.031 0.008 {pandas.algos.take_2d_axis0_bool_bool}
128106 0.029 0.000 0.029 0.000 {method 'rstrip' of 'str' objects}
61 0.028 0.000 0.028 0.000 {method 'write' of 'file' objects}
46 0.027 0.001 0.027 0.001 {method 'join' of 'str' objects}
1 0.025 0.025 0.035 0.035 table.py:13(<module>)
109 0.022 0.000 0.022 0.000 {method 'take' of 'numpy.ndarray' objects}
138790 0.020 0.000 0.020 0.000 {method 'append' of 'list' objects}
1 0.020 0.020 0.020 0.020 {pandas.algos.diff_2d_float64}
Pretty sure all the slow ups are all from the Octave code.
See issue #30 for details on the rtfilter.