XlsxWriter
XlsxWriter copied to clipboard
Updated xl_col_to_name() method to add recursion
This PR links to issue #881 which I opened recently – I propose that the xl_col_to_name() function (here) can be improved to a recursive function to bring it to O(log26 n) time complexity and O(1) auxiliary space.
I created a GitHub Gist here to compare the times of the older and the newer functions I propose with %timeit. IPython defaults to n = 100 anyway. The updated function is faster.
Writing tests
I haven't written any unit tests to resolve the issue as of now – I'm not aware of how to do so, but I can learn.
Hi @jmcnamara, how may I resolve the unsuccessful checks? The test is failing due to a whitespace of some sort in my code
Update: I fixed "W293 blank line contains whitespace" with autopep8 in another commit. The tests probably should not fail now
You can run the tests as follows:
# One time:
python -m pip install --upgrade pip
pip install pytest flake8
# Then run the tests:
make test_flake8
pytest
I'm on a Windows machine so I cannot run make
, is there any other way to go about this? I tried it on WSL (Ubuntu) as well but there it says test_flake8 does not exist.
Running pytest
by itself without make
works on Windows, 1572 passed and 0 errors detected – albeit the same gives me 15 errors during collection on WSL
Update: resolved the above by installing make
using Chocolatey.
I ran the flake8 tests, and resolved four issues – E501, a couple of W291s, and a E115.
pytest
also passes 1572, no errors. Pushed once again
For future reference you don't need make. You just need to run flake8 on the file(s) you modified:
# Change path to suit
flake8 xlsxwriter\utility.py
Also, you will need to squash your 3 commits into 1.
In order to merge this I will need to see some benchmarks using timeit or similar.
You will need to benchmark for the 1, 2 and 3 digit column numbers for example 0, 701 and 16383 (equivalent to A
, ZZ
and XFD
).
You will also need to benchmark with and without your "alphabets" lookup to compare like for like with the existing implementation (which uses this col_letter = chr(ord('A') + remainder - 1)
.
So in summary can you put a benchmark together to test:
- Column numbers 0, 701, 16383 (and maybe a combination) for:
- The existing implementation
- The new implementation
- The new implementation with
chr(ord('A')
Thanks a lot @jmcnamara, – I shall look into benchmarking them this week and get back!
Thinking about this a little more and there are a few things to consider:
- In general most real life column name conversions are in the column range A-Z so there won't be any tail recursion.
- The maximum conversion would be for column XFD which would only be 2 recursions
- The conversion is hidden behind a hash/cache lookup so it is only called once per column
Anyway, it would still be interesting to see your performance data when you get it. There is no rush on it.
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
I get your points. The recursive method may be a tad confusing to write (and understand!) in code, and also may not even be needed for practical purposes. I have just squashed the commits into one and force-pushed it.
I'm working on testing the functions now using the timeit library – in what way are the benchmarks needed? Should I put them in a Colab notebook and upload as a GitHub Gist? Or, are they to be put in a .py file in the xlsxwriter directory?
Should I put them in a Colab notebook and upload as a GitHub Gist
Yes, that would be great.
Apologies for the slight delay in sending these performance benchmarks – here they are in a GitHub Gist, as requested.
The new implementation with chr(ord('A')
has not been pushed yet – I shall push them in another commit and squash the two into one later, based on your feedback on the above.
Apologies for the slight delay in sending these performance benchmarks
No problem. I'm not in a rush. :-)
The new implementation with
chr(ord('A')
has not been pushed yet
You don't need to put this in a PR. Just add it to your benchmark as another option. So to be clear, just add another function to your gist called xl_col_to_name3()
or similar, that uses chr(ord('A')
instead of alphabets so that we can compare like with like.
So to be clear, just add another function to your gist called xl_col_to_name3() or similar
Oh, I've actually done that in the benchmark file, just not in the PR. So it contains a comparison of three functions; the original implementation, the newer implementation, and the chr(ord('A')
implementation.
(I did not define the functions with different names. I ran the benchmarks sequentially)
Oh, I've actually done that in the benchmark file,
Sorry I missed that. Thanks.