nuclei icon indicating copy to clipboard operation
nuclei copied to clipboard

File protocol regex search improvement

Open ehsandeep opened this issue 2 years ago • 3 comments

Please describe your feature request:

Currently, we read everything in memory with assumption of processing samller data, which might not be the case all the time and slows down as we increase the input items to process https://github.com/projectdiscovery/nuclei/blob/e383449fb32696fed7ed8ed9ff4b40a96eb311c5/v2/pkg/protocols/file/request.go#L54

Reference:

  • https://github.com/golang/go/issues/26623
  • https://github.com/BurntSushi/rure-go

shared by @yabeow

ehsandeep avatar Feb 10 '22 09:02 ehsandeep

Potential options to consider:

  • split large file into chunks and process them on separate threads
  • look into the feasibility of an interchangeable solution, controlled by a flag (default would remain the same, the flag would control the use of a shared library for more advanced users/use-cases)
  • look into Google's RE2?

forgedhallpass avatar Feb 10 '22 11:02 forgedhallpass

After investigation, the following implementations would be needed:

  • [ ] Actually, matcher works on string/byte slice only, it's necessary to implement a regex-based engine accepting io.Reader, capable of handling potential overlapping matches between chunks
  • [ ] rurego provides between x2 to x4 performance increase on large chunks of data => for better portability, the library should be optionally available statically linked within the GH generated binary.
  • [ ] Hyperscan is another very good option => the bindings are not up to date. We need to fork and refactor
  • [ ] Create bindings for https://github.com/google/re2

Mzack9999 avatar Feb 16 '22 12:02 Mzack9999

Blocked by #1634

Mzack9999 avatar Mar 02 '22 07:03 Mzack9999