web-scraper-chrome-extension
web-scraper-chrome-extension copied to clipboard
multiple startUrls apparently does not working / stop the startUrls pagination when condition
Hi,
Good job with the plugin.
Chrome: Version 87.0.4280.141 (Official Build) (64-bit) Ubuntu 20.04.1 LTS 64-bit
But I'm trying to use this:
Supported URL patterns:
1. Numeric with optional step and zero padding – [START_END:STEP] – [001_010:10]
my sitemap:
{"_id":"google","startUrls":["http://google.com.br?id=[001_010:10]"],"selectors":[{"id":"body","selector":"body","type":"SelectorHTML","parentSelectors":["_root"]}]}
and the pagination does not work.
I tried with 3.6 and it does not work again.
I would like the loop to stop the pagination when conditions like repeated elements or html contain.
Thank you.
Maybe there is a misunderstanding of documentation, but you should use url pattern as [START-END:STEP] – [001-010:10]
.
https://github.com/ispras/web-scraper-chrome-extension/blob/master/docs/Scraping%20a%20site.md
Seems there is a mistake in plugin hints. Will fix it.
I think in your case you should use this sitemap as example:
{"_id":"google","startUrls":["http://google.com.br?id=[001-010:1]"],"selectors":[{"id":"body","selector":"body","type":"SelectorHTML","parentSelectors":["_root"]}]}
then it will make requests to:
https://www.google.com.br/?id=010
...
https://www.google.com.br/?id=001
It works!!! :) Until I try to see the big regex in the Sitemap.js getStartUrls()... Now all everything became clear. kkkk Add my feature idea to the queue. (I would like the loop to stop the pagination when conditions like repeated elements or html contain). Thank you very much!!!
Hi again. Same problem again. I had forgotten.
In the Edit metadata there is an help:
1. Numeric with optional step and zero padding – [START_END:STEP] – [001_010:10]
but the correct is:
1. Numeric with optional step and zero padding – [START_END:STEP] – [001-010:10]
with - not _
Please replace it.
Thank you