nuclei
nuclei copied to clipboard
File protocol regex search improvement
trafficstars
Please describe your feature request:
Currently, we read everything in memory with assumption of processing samller data, which might not be the case all the time and slows down as we increase the input items to process https://github.com/projectdiscovery/nuclei/blob/e383449fb32696fed7ed8ed9ff4b40a96eb311c5/v2/pkg/protocols/file/request.go#L54
Reference:
- https://github.com/golang/go/issues/26623
- https://github.com/BurntSushi/rure-go
shared by @yabeow
Potential options to consider:
- split large file into chunks and process them on separate threads
- look into the feasibility of an interchangeable solution, controlled by a flag (default would remain the same, the flag would control the use of a shared library for more advanced users/use-cases)
- look into Google's RE2?
After investigation, the following implementations would be needed:
- [ ] Actually, matcher works on string/byte slice only, it's necessary to implement a regex-based engine accepting
io.Reader, capable of handling potential overlapping matches between chunks - [ ]
ruregoprovides between x2 to x4 performance increase on large chunks of data => for better portability, the library should be optionally available statically linked within the GH generated binary. - [ ] Hyperscan is another very good option => the bindings are not up to date. We need to fork and refactor
- [ ] Create bindings for
https://github.com/google/re2
Blocked by #1634