TorCrawl.py
TorCrawl.py copied to clipboard
File name mismatch when using the Extract option
Describe the bug
When using the extract option (i.e., -e), there is a file name mismatch. In fact, the software expects to read from a file called links.txt, but it writes a file with the format <date>_links.txt.
To Reproduce
In order to reproduce the problem, it's just as easy as running one of the examples on the homepage, that is (after minor modifications):
python3 torcrawl.py -v -u http://www.github.com/ -c -d 2 -p 0 -e -w
and the output will be
## Your IP: A.B.C.D.
## URL: http://www.github.com/
## Folder created: www.github.com
## Crawler started from http://www.github.com/ with 2 depth crawl, and 0 second(s) delay.
## Step 1 completed with: 40 result(s)
## Step 2 completed with: 857 result(s)
## File created on /Users/user/TorCrawl.py/www.github.com/links.txt
Error: [Errno 2] No such file or directory: 'www.github.com/links.txt'
## Can't open: www.github.com/links.txt
Traceback (most recent call last):
File "/Users/user/TorCrawl.py/torcrawl.py", line 210, in <module>
main()
File "/Users/user/TorCrawl.py/torcrawl.py", line 199, in main
extractor(
File "/Users/user/TorCrawl.py/modules/extractor.py", line 206, in extractor
cinex(input_file, out_path, selection_yara)
File "/Users/user/TorCrawl.py/modules/extractor.py", line 72, in cinex
for line in file:
TypeError: 'type' object is not iterable
in fact, by browsing the newly-created www.github.com folder, we have a file called 20240626_links.txt rather than simply links.txt.
Expected behavior
That TypeError should not appear.
Desktop (please complete the following information):
- OS: macOS 14.5
- Python Version: 3.12.4
Fix
The fix is quite straightforward. In torcrawl.py, the line
if args.extract:
input_file = out_path + "/links.txt"
extractor(
website, args.crawl, output_file, input_file, out_path, selection_yara
)
should be replaced with
if args.extract:
input_file = out_path + "/" + now + "_links.txt"
extractor(
website, args.crawl, output_file, input_file, out_path, selection_yara
)