urlwatch icon indicating copy to clipboard operation
urlwatch copied to clipboard

question/support on xpath syntax

Open ghost opened this issue 4 years ago • 1 comments

I am trying to learn how to get specific instances of <div> when they are named the same.

In this path '//div[contains(@class,"callouts-container")]' how would I specify the first instance of "callouts-container"? Maybe using [1] but where is this placed?

CDC COVID-19 website

I just want total cases, new cases, total deaths, new deaths, and not cases among HCP and Deaths among HCP which I am getting with my job:

# CDC COVID-19
name: (33)CDC COIVD-19 cases
url: "https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F2019-ncov%2Fcases-updates%2Fsummary.html"
filter:
  - xpath: 
     path: '//div[contains(@class,"callouts-container")]'
  - re.sub: '(?m)^[ \t]*' # removes all leading spaces
  - html2text: re
---

result:

iMac191:~ john$ uwtf 33
Total Cases
5,682,491
38,679 New Cases*
Total Deaths
176,223
572 New Deaths*
Cases among HCP
142,935
Deaths among HCP
660

I see there are some examples on usage on w3Schools but I can't always get that syntax to work in urlwatch or I am not understanding correctly.

I would rather not create "issues" for support so I created a urlwatch subreddit today. Hopefully these sorts of questions will move there and be answered there,,, a better forum.

ghost avatar Aug 24 '20 22:08 ghost

Seems like this works. Is there another way?

# CDC COVID-19
name: (33)CDC COIVD-19 cases
url: "https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F>
filter:
  - xpath: 
     path: '(//div[@class="cases-callouts"]/div[position()<4])'
  - re.sub: '(?m)^[ \t]*' # removes all -leading spaces
  - html2text: re
---
Mon Aug 24 06:48:04
iMac191:~ john$ uwtf 33
Total Cases
5,682,491
38,679 New Cases*
Total Deaths
176,223
572 New Deaths*

If you have multiple <div> that contain the same value that will preclude using this syntax //div[contains(@class,"cases-callouts")]' You can use: //div[@class='cases-callouts'] to find an exact match for the class value.

ghost avatar Aug 25 '20 00:08 ghost