DataSurgeon
DataSurgeon copied to clipboard
Improve IPv4 detection (to avoid false positives)
Hi again :)
This is currently possible
ds -f sample.txt
999.999.999.999
$$$$$999.999.999.999$$$$$
1.1.1.1
output:
ip_address: 999.999.999.999
ip_address: $$$999.999.999.999$$$
ip_address: 1.1.1.1
I'm not really sure howto handle all special cases, but I noticed this while searching html-files and found several false positives within inline svg's.
FYI: This is the regexp I'm using for finding IPV4 in bash (no cidr support)
export IS_IP4='(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
..compared to the more simplified version here: https://github.com/Drew-Alleman/DataSurgeon/commit/fd0bab1163d43ce65d8cf3283fbb0dd15f339a4f#diff-42cb6807ad74b3e201c5a7ca98b911c5fa08380e942be6e4ac5807f8377f87fcR299
Btw, here's a usecase
https://support.censys.io/hc/en-us/articles/360043177092-Opt-Out-of-Data-Collection
ds -i -X -T -C -f 360043177092-Opt-Out-of-Data-Collection
2.2.5.2
2.2.5.2
162.142.125.0
167.94.138.0
167.94.145.0
167.94.146.0
167.248.133.0
198.023.39.065
1.112.042.314
2.2.5.2
By looking at the source code we can see (except for the missing cidr support that you fixed in #13 ...that 2.2.5.2 comes from this inline svg:
<svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" focusable="false" viewBox="0 0 12 12"
aria-hidden="true" class="collapsible-nav-toggle-icon chevron-icon">
<path fill="none" stroke="currentColor" stroke-linecap="round" d="M3 4.5l2.6 2.6c.2.2.5.2.7 0L9 4.5">
</path>
</svg>
Hi,
This is what my new regex is giving me. I will push it in the next update, (i gotta fix it again)
drew@ubuntu:~/DataSurgeon$ ./target/release/ds -f bad.txt -i
ip_address: 162.142.125.0:80
ip_address: 162.142.125.0
ip_address: 167.94.138.0
ip_address: 167.94.145.0
ip_address: 167.94.146.0
ip_address: 167.248.133.0
Best, Drew