rust_xlsxwriter icon indicating copy to clipboard operation
rust_xlsxwriter copied to clipboard

Add feature for switching between regex dependencies

Open Degubi opened this issue 1 year ago • 9 comments

Implementation for request in issue #105

  • Made the dependency on 'regex' optional
  • Added feature flag for 'regex' (added it to default-features, making it the default to avoid any behaviour changes)
  • Added feature flag for 'regex-lite'
  • Updated the 'static_regex' macros
  • Update regex usage in test files

Degubi avatar Aug 08 '24 16:08 Degubi

Note: There seem to be 19 failed test cases when using the regex-lite crate with these messages: 'Unicode character classes are not supported' This is to be expected as the regex-lite crate only has basic support for unicode characters.

Degubi avatar Aug 08 '24 16:08 Degubi

Thanks. I'll take a look at the PR once I'm back online on Monday. Overall it looks good.

This is to be expected as the regex-lite crate only has basic support for unicode characters.

That may be a deal breaker. I'll need to look at that in more detail.

jmcnamara avatar Aug 08 '24 17:08 jmcnamara

Great thank you!

Also ran cargo fmt and force-pushed because I forgot

Degubi avatar Aug 08 '24 18:08 Degubi

Thanks. Could you add a GitHub action with regex-lite as well so I can see the failures. Use one of the others actions like the Polars test as an example.

jmcnamara avatar Aug 08 '24 18:08 jmcnamara

I added it! But I noticed something: Running 'cargo test' yielded only 19 error messages Running 'cargo test --test integration' made all tests fail: panicked at src/utility.rs:445:23 It seems it doesn't like the regex on that line (and on the next one)

By removing the \p{Emoji} parts from those 2 regexes only a single test fails: quote_name11::test_quote_name11

Degubi avatar Aug 08 '24 19:08 Degubi

That is probably fixable. Python re regexes don't support \p{Emoji} either but I was able to work around it.

Leave it with me until I'm back online next week.

jmcnamara avatar Aug 08 '24 20:08 jmcnamara

I had a look at this in more detail and I don't think it is feasible to add regex-lite due to the lack of Unicode support.

Apart from the failing integration test there are a number of tests that fail with basic Unicode support:

https://github.com/jmcnamara/rust_xlsxwriter/blob/main/src/utility/tests.rs#L175-L181

So any sheet name in non-English/ASCII will end up being quoted unnecessarily. This is generally a safe failure since Excel will accept an erroneously quoted sheet name in place of an optimally quoted sheetname. However, there may be cases where this causes an Excel read/load error.

So I would need to enable the regex-lite feature with a number of warnings/caveats that would confuse the average end user. It also adds a fair amount of fragility and paints me into a corner when adding other regexes in the future. So, overall and from a maintenance point of view I feel that this isn't worth the effort. I hope you can accept that.

I'll leave this open to see if other people add +1s for the use case and I may add it at some stage with a "Developer Beware" warning.

jmcnamara avatar Aug 11 '24 16:08 jmcnamara

@Degubi, in case this discussion comes up again in the future could you add some example size differences between the Regex and Regex-Lite versions, as a reference. Thanks.

jmcnamara avatar Aug 11 '24 23:08 jmcnamara

Sure thing @jmcnamara ! Will do later today. Thank you for taking a look!

Degubi avatar Aug 12 '24 04:08 Degubi

@jmcnamara Here are some of my results:

WindowsLinux
  regex regex-lite
Debug 6066 KB 2963 KB
Release 2817 KB 1580 KB
  regex regex-lite
Debug 36.3 MB 22.5 MB
Release 2.9 MB 1.6 MB

Code: https://github.com/jmcnamara/rust_xlsxwriter?tab=readme-ov-file#example

Cargo.toml:

[package]
name = "excel-size-diff"
version = "1.0.0"
edition = "2021"

[dependencies]
# rust_xlsxwriter = { git = "https://github.com/Degubi/rust_xlsxwriter.git", branch = "feature-regex-lite", default-features = false, features = [ "regex" ] }
rust_xlsxwriter = { git = "https://github.com/Degubi/rust_xlsxwriter.git", branch = "feature-regex-lite", default-features = false, features = [ "regex-lite" ] }

[profile.release]
strip = "symbols"
lto = true
codegen-units = 1

Degubi avatar Aug 12 '24 18:08 Degubi

@Degubi In case you miss it, we are currently working to remove regex altogether and we making some progress: #108.

jmcnamara avatar Aug 27 '24 00:08 jmcnamara

Looking at the conversation in #108 and the work done on the #no_regex branch I'm gonna assume that this PR can be closed. Thank you for your great work!

Degubi avatar Aug 30 '24 08:08 Degubi