mailman-archive-scraper
mailman-archive-scraper copied to clipboard
Problem with multilingual scraping
I was trying to scrape
https://stat.ethz.ch/pipermail/r-help-es/
It seems that scrapeList gets the year-month variables from the rownames in the table there, which happen to be in Spanish. However, the link is in English. So, it fails to retrieve
https://stat.ethz.ch/pipermail/r-help-es/2013-Diciembre/date.html
(which does not exist). The link that does exist is
https://stat.ethz.ch/pipermail/r-help-es/2013-December/date.html
however. Wouldn't it be possible to get the relative path from the <A> in the table to solve these issues?
Best regards,
Carlos J. Gil Bellosta http://www.datanalytics.com
I know it's been a while since you posted this - sorry about that. I've just now committed what I hope is a fix. If you're interested in giving it another go, please do and let me know how you get on.