bulk_extractor icon indicating copy to clipboard operation
bulk_extractor copied to clipboard

Provide a way to differentiate between different regex matches in find scanner output

Open AarjavP opened this issue 9 years ago • 3 comments

With the -F option, a file can be specified which contains one or more regular expression to search for. When this option is used, all the matches are outputted in a single text file [find.txt] without any indication of which of the input regex matched. It would be nice to be able to tell which expression was matched without running the output against the input regexs.

One solution could be to just output in a different file for each regex [find_1.txt, find_2.txt, ...] Another would be to add another column whose elements correspond to a regex in the input file. (1 for first regex, 2 for second regex in the input file)

I've skimmed over the code and to me it doesn't seem like it would be easy as i thought to just add a new column in. But I'll keep playing around with it and see what I can get. Any suggestions on what I should do are welcome.

AarjavP avatar Jun 25 '15 22:06 AarjavP

What are you trying to do, and how many regular expressions do you have?

Sent from my iPhone

On Jun 25, 2015, at 6:19 PM, AarjavP [email protected] wrote:

With the -F option, a file can be specified which contains one or more regular expression to search for. When this option is used, all the matches are outputted in a single text file [find.txt] without any indication of which of the input regex matched. It would be nice to be able to tell which expression was matched without running the output against the input regexs.

One solution could be to just output in a different file for each regex [find_1.txt, find_2.txt, ...] Another would be to add another column whose elements correspond to a regex in the input file. (1 for first regex, 2 for second regex in the input file)

I've skimmed over the code and to me it doesn't seem like it would be easy as i thought to just add a new column in. But I'll keep playing around with it and see what I can get. Any suggestions on what I should do are welcome.

— Reply to this email directly or view it on GitHub.

simsong avatar Jun 25 '15 22:06 simsong

I will have around 20 expressions. currently, I am using the scanner for

  • finding locations/counts of approximate matches on certain names
  • extracting shipping/tracking numbers (UPS, FedEx, etc)

however I might add more later on.

AarjavP avatar Jun 26 '15 00:06 AarjavP

You should create your own scanner. As you add more regular expressions, your performance will degrade significantly.

On Jun 25, 2015, at 8:42 PM, AarjavP [email protected] wrote:

I will have around 20 expressions. currently, I am using the scanner for

find locations/counts of approximate matches on certain names extracting shipping/tracking numbers (UPS, FedEx, etc) however I might add more later on.

— Reply to this email directly or view it on GitHub https://github.com/simsong/bulk_extractor/issues/73#issuecomment-115446537.

simsong avatar Jun 26 '15 00:06 simsong