web-scraper-chrome-extension icon indicating copy to clipboard operation
web-scraper-chrome-extension copied to clipboard

Some new features

Open codoff opened this issue 8 years ago • 19 comments

Hi,

I added two features and offer to add them into the parent branch.

First, it is date pattern support for start url: datepattern

and second, limit for Click and Scroll Selectors (scrape only first 3 pages, for example): paginationlimit

codoff avatar Feb 28 '17 10:02 codoff

You should include your DateUtils scripts in SpecRunner.html to prevent errors in the unit tests. It would be also great if you could write some unit tests for your features - it is self explaining if you have a look into the tests folder.

jwillmer avatar May 22 '17 12:05 jwillmer

I repaired SpecRunner.html and add unit tests

codoff avatar May 24 '17 14:05 codoff

Cool, merged in my fork. Thank you!

jwillmer avatar May 24 '17 18:05 jwillmer

I encountered two issues in your tests regarding time zones (I am in +3): issue1

jwillmer avatar May 24 '17 18:05 jwillmer

It's very strange... At me all tests are green.

2017-05-25_111552

It is also strange that only two fails. Case 'SimpleDateFormatter.parse' pattern 'MM/dd/yyyy' is normal?

codoff avatar May 25 '17 08:05 codoff

Yes, only the two I posted are failing. Try to change your time zone on your system. I can reproduce different values 2h, 3h difference depending on the timezone.

jwillmer avatar May 25 '17 08:05 jwillmer

@jwillmer I think my pagination limit and #178 do the same thing, merge them together is not a good idea.

The difference is that #178 applies only to ElementClick, my option applies to ElementClick and ElementScroll.

codoff avatar May 25 '17 09:05 codoff

My timezone is +3 and all fine. Try replacing the dot with an underscore (dd_MM_yyyy and 16_06_2016). Will the test result change?

codoff avatar May 25 '17 10:05 codoff

No, sadly changing the format to parse does not change the outcome.

@jwillmer I think my pagination limit and #178 do the same thing, merge them together is not a good idea.

I looked at both implementations and you are right. I will remove #178 since it is a feature duplication.

Update: #178 removed and documentation extended

jwillmer avatar May 25 '17 10:05 jwillmer

Found the problem problem

jwillmer avatar May 25 '17 11:05 jwillmer

It should not be. On my computer, the date is created the same.

new Date("1970-01-01T00:00:00.000Z");
Thu Jan 01 1970 03:00:00 GMT+0300 (RTZ 2 (зима))
new Date(0);
Thu Jan 01 1970 03:00:00 GMT+0300 (RTZ 2 (зима))

About the creation of the date in javascript, the following is said:

  1. Using new Date(number), creates a new date object as zero time plus the number. Zero time is 01 January 1970 00:00:00 UTC. The number is specified in milliseconds.

  2. The ISO 8601 syntax "YYYY-MM-DDTHH:MM:SS.SSSZ" is also the preferred JavaScript date format. UTC time is defined with a capital letter Z.

In both case must be UTC timezone. Also, it's not clear why the test 'MM/dd/yyyy' passes. I give up.

codoff avatar May 25 '17 11:05 codoff

I will create a workaround for my use case and write a log message if this case happens:

https://github.com/jwillmer/web-scraper-chrome-extension/commit/7455e3b50ff923322d8787b6f50297b2fc03fb81

jwillmer avatar May 25 '17 11:05 jwillmer

Sorry for me deleted post but toUTC does not fix it. The workaround I posted does fix it.

jwillmer avatar May 25 '17 12:05 jwillmer

Hi everyone,

I'm trying to add the "pagination limit" feature code into the webscraper.io proram. Does anyone know how to do this? Also which code is specifically for the "pagination limit" function?

Thanks in advance!

Asp1ringC0d3r avatar Jul 10 '17 13:07 Asp1ringC0d3r

@Asp1ringC0d3r If you read the tread you would see that I already merged it:

Cool, merged in my fork. Thank you!

jwillmer avatar Jul 10 '17 14:07 jwillmer

@jwillmer I'm a hobbyist when it comes to coding. What do you mean by merged it? Is it already a feature in the webscraper.io?

Asp1ringC0d3r avatar Jul 10 '17 14:07 Asp1ringC0d3r

@Asp1ringC0d3r based on your comments you should not try to add (merge) the pagination limit feature yourself. Have a look at my project copy (fork). I included (merged) all open features (pull requests).

jwillmer avatar Jul 10 '17 14:07 jwillmer

@jwillmer I already have the chrome extension. How do I install your project copy to access all the open features?

Asp1ringC0d3r avatar Jul 10 '17 17:07 Asp1ringC0d3r

@Asp1ringC0d3r Read the readme. If that does not help open an issue in my project, this is not the place for it.

jwillmer avatar Jul 10 '17 17:07 jwillmer