python-fitparse
python-fitparse copied to clipboard
Remove RecordBase to speedup processing
Fix #55
The second commit limits the data processor getattr
.
Looks decent. Thanks for submitting. Need more time to review more intimately. What sort of performance gains do you observe.
The first commit is about 10-15%, the second commit about 5%.
The second commit moved caching to the instance of processor, so, if someone uses a new processor every time and have small FITs, may have some performance degradation, maybe. Anyway, maybe the scrubbing is to complicated and may be simplified? Do not know.
I've added BaseType.size
caching to the first commit, which is kind of micro-optimization.
Update: I've run benchmarks on my Win32 Intel i5 2.5GHz machine previously. I've tried it on our Linux x64 (Ubuntu Xenial) Intel Xeon 2.6GHz server, and it's 5x faster. (all running Python 3.5.2). IMHO it cannot be caused just by the faster processor, there must be some performance problem with Python on Windows or on x32 arch.
Anyway, removing RecordBase
brings more flexibility to class constructor, e.g. see #58.
Just about the commit: FitFileDataProcessor cache methods not just method names
I thing the data_processor
should be removed from the FitFile
completely. It may be designed as a wrapper around DataMessage
, so a user may plugin into the parsing process externally. Something like:
with FitFile(...) as ffile:
pr = FitFileDataProcessor(ffile)
for m in pr.get_messages(...):
# m is processed
This way, we do not need any static caching, it's up to the user.
@pR0Ps thanks for CR. Meanwhile I've got mroe insight into the problem with the processors a bit. So I would discard the commit about the processors, make a issue for that where we can debate it more in depth.
If your solution to the processors is the one you suggested above, I wouldn't recommend it. Making things "up to the user" is just pushing the problem along. If most users are going to want the values out of the FitFile
object as standard SI units (likely), then it makes sense for that to be the default. The common case should be easy, and the advanced case (get raw values) should be possible. This is the way it is currently set up now.