[FEA] Enable regular expressions by default
Is your feature request related to a problem? Please describe. Regular expression support is currently disabled by default due to many known compatibility issues, which are documented in the compatibility guide. This epic is to track the work required to address these issues and enable the feature by default.
Completed
- [x] https://github.com/NVIDIA/spark-rapids/issues/3797
- [x] https://github.com/NVIDIA/spark-rapids/issues/3866
- [x] https://github.com/NVIDIA/spark-rapids/issues/3962
- [x] https://github.com/NVIDIA/spark-rapids/issues/4001
- [x] https://github.com/NVIDIA/spark-rapids/issues/4002
- [x] https://github.com/NVIDIA/spark-rapids/issues/4091
- [x] https://github.com/NVIDIA/spark-rapids/issues/4170
- [x] https://github.com/NVIDIA/spark-rapids/issues/4229
- [x] https://github.com/NVIDIA/spark-rapids/issues/4284
- [x] https://github.com/NVIDIA/spark-rapids/issues/4467
- [x] https://github.com/NVIDIA/spark-rapids/issues/4412
- [x] https://github.com/NVIDIA/spark-rapids/issues/4521
- [x] https://github.com/NVIDIA/spark-rapids/issues/4503
- [x] https://github.com/NVIDIA/spark-rapids/issues/4559
- [x] https://github.com/NVIDIA/spark-rapids/issues/4330
- [x] https://github.com/NVIDIA/spark-rapids/issues/4475
- [x] https://github.com/NVIDIA/spark-rapids/issues/4409
- [x] https://github.com/NVIDIA/spark-rapids/issues/4003
High Priority
- [x] https://github.com/NVIDIA/spark-rapids/issues/4487
- [x] https://github.com/NVIDIA/spark-rapids/issues/4557
- [x] https://github.com/NVIDIA/spark-rapids/issues/5135
- [x] https://github.com/NVIDIA/spark-rapids/issues/4425
- [x] https://github.com/NVIDIA/spark-rapids/issues/4468
- [x] https://github.com/NVIDIA/spark-rapids/issues/4532
- [x] https://github.com/NVIDIA/spark-rapids/issues/4533
- [x] https://github.com/NVIDIA/spark-rapids/issues/4800
- [x] https://github.com/NVIDIA/spark-rapids/issues/5549
- [x] https://github.com/NVIDIA/spark-rapids/issues/5711
- [x] https://github.com/NVIDIA/spark-rapids/issues/5521
- [x] https://github.com/NVIDIA/spark-rapids/issues/4719
- [ ] https://github.com/NVIDIA/spark-rapids/issues/4511
Medium Priority
- [x] https://github.com/NVIDIA/spark-rapids/issues/4528
- [x] https://github.com/NVIDIA/spark-rapids/issues/4517
- [x] https://github.com/NVIDIA/spark-rapids/issues/4605
- [x] https://github.com/NVIDIA/spark-rapids/issues/5456
- [x] https://github.com/NVIDIA/spark-rapids/issues/5525
- [ ] https://github.com/NVIDIA/spark-rapids/issues/5488
- [ ] https://github.com/NVIDIA/spark-rapids/issues/5478
- [x] https://github.com/NVIDIA/spark-rapids/issues/5659
- [ ] https://github.com/NVIDIA/spark-rapids/issues/5973
- [ ] https://github.com/NVIDIA/spark-rapids/issues/6469
- [ ] https://github.com/NVIDIA/spark-rapids/issues/10764
Low Priority
- [x] https://github.com/NVIDIA/spark-rapids/issues/4486
- [x] https://github.com/NVIDIA/spark-rapids/issues/4505
- [x] https://github.com/NVIDIA/spark-rapids/issues/4746
- [x] https://github.com/NVIDIA/spark-rapids/issues/5415
- [x] https://github.com/NVIDIA/spark-rapids/issues/4862
- [x] https://github.com/NVIDIA/spark-rapids/issues/4413
- [x] https://github.com/NVIDIA/spark-rapids/issues/4866
- [x] https://github.com/NVIDIA/spark-rapids/issues/4865
- [x] https://github.com/NVIDIA/spark-rapids/issues/4518
- [x] https://github.com/NVIDIA/spark-rapids/issues/4519
- [x] https://github.com/NVIDIA/spark-rapids/issues/5909
- [x] https://github.com/NVIDIA/spark-rapids/issues/5846
- [x] https://github.com/NVIDIA/spark-rapids/issues/4720
- [x] https://github.com/NVIDIA/spark-rapids/issues/4537
- [x] https://github.com/NVIDIA/spark-rapids/issues/4353
- [x] https://github.com/NVIDIA/spark-rapids/issues/4283
- [x] https://github.com/NVIDIA/spark-rapids/issues/5656
- [x] https://github.com/NVIDIA/spark-rapids/issues/4603
- [x] https://github.com/NVIDIA/spark-rapids/issues/4061
- [ ] https://github.com/NVIDIA/spark-rapids/issues/4415
Describe the solution you'd like Support the regular expressions functions and expressions by default with 100% compatibility with Spark:
- regexp / regexp_like / RLIKE
- regexp_replace
- regexp_extract
- regexp_extract_all
- split
Describe alternatives you've considered None
Additional context None
@andygrove FYI I added #4511 to the list, since I think we need to improve the current situation where regex kernels can fail with a confusing OOM error due to insufficient reserved memory rather than insufficient pool memory.
~Hi @andygrove, I found another bug about regexp_extract #5088. Shall we put it in the list ?~
Hi @andygrove, I added #5135 to the list as a high priority task, since I think it is a correctness issue which is not only triggered by corner cases.