bulk_extractor
bulk_extractor copied to clipboard
Provide a way to differentiate between different regex matches in find scanner output
With the -F option, a file can be specified which contains one or more regular expression to search for. When this option is used, all the matches are outputted in a single text file [find.txt] without any indication of which of the input regex matched. It would be nice to be able to tell which expression was matched without running the output against the input regexs.
One solution could be to just output in a different file for each regex [find_1.txt, find_2.txt, ...] Another would be to add another column whose elements correspond to a regex in the input file. (1 for first regex, 2 for second regex in the input file)
I've skimmed over the code and to me it doesn't seem like it would be easy as i thought to just add a new column in. But I'll keep playing around with it and see what I can get. Any suggestions on what I should do are welcome.
What are you trying to do, and how many regular expressions do you have?
Sent from my iPhone
On Jun 25, 2015, at 6:19 PM, AarjavP [email protected] wrote:
With the -F option, a file can be specified which contains one or more regular expression to search for. When this option is used, all the matches are outputted in a single text file [find.txt] without any indication of which of the input regex matched. It would be nice to be able to tell which expression was matched without running the output against the input regexs.
One solution could be to just output in a different file for each regex [find_1.txt, find_2.txt, ...] Another would be to add another column whose elements correspond to a regex in the input file. (1 for first regex, 2 for second regex in the input file)
I've skimmed over the code and to me it doesn't seem like it would be easy as i thought to just add a new column in. But I'll keep playing around with it and see what I can get. Any suggestions on what I should do are welcome.
— Reply to this email directly or view it on GitHub.
I will have around 20 expressions. currently, I am using the scanner for
- finding locations/counts of approximate matches on certain names
- extracting shipping/tracking numbers (UPS, FedEx, etc)
however I might add more later on.
You should create your own scanner. As you add more regular expressions, your performance will degrade significantly.
On Jun 25, 2015, at 8:42 PM, AarjavP [email protected] wrote:
I will have around 20 expressions. currently, I am using the scanner for
find locations/counts of approximate matches on certain names extracting shipping/tracking numbers (UPS, FedEx, etc) however I might add more later on.
— Reply to this email directly or view it on GitHub https://github.com/simsong/bulk_extractor/issues/73#issuecomment-115446537.