sparkler icon indicating copy to clipboard operation
sparkler copied to clipboard

Argument '-i -1' does not work.

Open MobinRanjbar opened this issue 5 years ago • 4 comments

Hi there,

I wanted to crawl whole content of a website. When I run the command below, crawling process does not start. What is wrong?

bin/sparkler.sh crawl -id 1 -i -1

Output: 2020-06-19 12:38:06 INFO Crawler$:153 - Committing crawldb.. 2020-06-19 12:38:06 INFO Crawler$:221 - Shutting down Spark CTX..

MobinRanjbar avatar Jun 19 '20 07:06 MobinRanjbar

Sparkler does nothing when no URLs are there to crawl. And your output looks like there are no new URLs to be crawled.
try injecting some new URLs and try again.

thammegowda avatar Jun 20 '20 04:06 thammegowda

Hi,

I have injected a new URL before that like below. The same thing happens.

bin/sparkler.sh inject -id 1 -su 'https://www.nasa.gov/'

MobinRanjbar avatar Jun 20 '20 06:06 MobinRanjbar

I am guessing there is an error in your setup. Did you try it from docker image https://hub.docker.com/r/uscdatascience/sparkler/tags ; could you please try?

CC @buggtb do you have any guesses on why/when/how this case might happen?

thammegowda avatar Jun 22 '20 23:06 thammegowda

Hi,

The same thing happened in docker!! :

sparkler@292e25536b51:/data/sparkler$ bin/sparkler.sh inject -id 1 -su 'https://www.nasa.gov/' 2020-06-23 07:46:16 INFO Injector$:97 - Injecting 1 seeds jobId = 1 sparkler@292e25536b51:/data/sparkler$ bin/sparkler.sh crawl -id 1 -tn 100 -i -1 2020-06-23 07:46:35 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2020-06-23 07:46:40 INFO Crawler$:153 - Committing crawldb.. 2020-06-23 07:46:40 INFO Crawler$:221 - Shutting down Spark CTX.. sparkler@292e25536b51:/data/sparkler$

Have you ever tried that argument?

MobinRanjbar avatar Jun 23 '20 07:06 MobinRanjbar