Add feature for switching between regex dependencies
Implementation for request in issue #105
- Made the dependency on 'regex' optional
- Added feature flag for 'regex' (added it to default-features, making it the default to avoid any behaviour changes)
- Added feature flag for 'regex-lite'
- Updated the 'static_regex' macros
- Update regex usage in test files
Note: There seem to be 19 failed test cases when using the regex-lite crate with these messages: 'Unicode character classes are not supported' This is to be expected as the regex-lite crate only has basic support for unicode characters.
Thanks. I'll take a look at the PR once I'm back online on Monday. Overall it looks good.
This is to be expected as the regex-lite crate only has basic support for unicode characters.
That may be a deal breaker. I'll need to look at that in more detail.
Great thank you!
Also ran cargo fmt and force-pushed because I forgot
Thanks. Could you add a GitHub action with regex-lite as well so I can see the failures. Use one of the others actions like the Polars test as an example.
I added it! But I noticed something: Running 'cargo test' yielded only 19 error messages Running 'cargo test --test integration' made all tests fail: panicked at src/utility.rs:445:23 It seems it doesn't like the regex on that line (and on the next one)
By removing the \p{Emoji} parts from those 2 regexes only a single test fails: quote_name11::test_quote_name11
That is probably fixable. Python re regexes don't support \p{Emoji} either but I was able to work around it.
Leave it with me until I'm back online next week.
I had a look at this in more detail and I don't think it is feasible to add regex-lite due to the lack of Unicode support.
Apart from the failing integration test there are a number of tests that fail with basic Unicode support:
https://github.com/jmcnamara/rust_xlsxwriter/blob/main/src/utility/tests.rs#L175-L181
So any sheet name in non-English/ASCII will end up being quoted unnecessarily. This is generally a safe failure since Excel will accept an erroneously quoted sheet name in place of an optimally quoted sheetname. However, there may be cases where this causes an Excel read/load error.
So I would need to enable the regex-lite feature with a number of warnings/caveats that would confuse the average end user. It also adds a fair amount of fragility and paints me into a corner when adding other regexes in the future. So, overall and from a maintenance point of view I feel that this isn't worth the effort. I hope you can accept that.
I'll leave this open to see if other people add +1s for the use case and I may add it at some stage with a "Developer Beware" warning.
@Degubi, in case this discussion comes up again in the future could you add some example size differences between the Regex and Regex-Lite versions, as a reference. Thanks.
Sure thing @jmcnamara ! Will do later today. Thank you for taking a look!
@jmcnamara Here are some of my results:
| Windows | Linux | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
Code: https://github.com/jmcnamara/rust_xlsxwriter?tab=readme-ov-file#example
Cargo.toml:
[package]
name = "excel-size-diff"
version = "1.0.0"
edition = "2021"
[dependencies]
# rust_xlsxwriter = { git = "https://github.com/Degubi/rust_xlsxwriter.git", branch = "feature-regex-lite", default-features = false, features = [ "regex" ] }
rust_xlsxwriter = { git = "https://github.com/Degubi/rust_xlsxwriter.git", branch = "feature-regex-lite", default-features = false, features = [ "regex-lite" ] }
[profile.release]
strip = "symbols"
lto = true
codegen-units = 1
@Degubi In case you miss it, we are currently working to remove regex altogether and we making some progress: #108.
Looking at the conversation in #108 and the work done on the #no_regex branch I'm gonna assume that this PR can be closed. Thank you for your great work!