mailman-archive-scraper icon indicating copy to clipboard operation
mailman-archive-scraper copied to clipboard

Problem with multilingual scraping

Open cjgb opened this issue 11 years ago • 1 comments

I was trying to scrape

https://stat.ethz.ch/pipermail/r-help-es/

It seems that scrapeList gets the year-month variables from the rownames in the table there, which happen to be in Spanish. However, the link is in English. So, it fails to retrieve

https://stat.ethz.ch/pipermail/r-help-es/2013-Diciembre/date.html

(which does not exist). The link that does exist is

https://stat.ethz.ch/pipermail/r-help-es/2013-December/date.html

however. Wouldn't it be possible to get the relative path from the <A> in the table to solve these issues?

Best regards,

Carlos J. Gil Bellosta http://www.datanalytics.com

cjgb avatar Dec 03 '13 15:12 cjgb

I know it's been a while since you posted this - sorry about that. I've just now committed what I hope is a fix. If you're interested in giving it another go, please do and let me know how you get on.

philgyford avatar Oct 22 '14 21:10 philgyford