WebTechnologies icon indicating copy to clipboard operation
WebTechnologies copied to clipboard

{polite} package and web etiquette

Open wibeasley opened this issue 1 year ago • 8 comments

I like the Scraping ethics & legalities section for R for Data Science (2e).

Before we get started discussing the code you’ll need to perform web scraping, we need to talk about whether it’s legal and ethical for you to do so....

I think many R users (like students, statisticians, data scientists) are not as familiar with etiquette & conventions as web developers and most people web scraping. It would be nice if our web scraping section referred the reader to this info, as well as the polite package.

The three pillars of a polite session are seeking permission, taking slowly and never asking twice.

The package builds on awesome toolkits for defining and managing http sessions (httr and rvest), declaring the user agent string and investigating site policies (robotstxt), and utilizing rate-limiting and response caching (ratelimitr and memoise).

@pachadotdev, do you have thoughts? It's not the conventional material for a cran task view. I'm thinking a few sentences and links. Nothing preachy --just pointing them to these resources if the reader wants to educate themselves?

wibeasley avatar Apr 04 '23 14:04 wibeasley

@wibeasley this would be extremely positive in my own case, I have to scrap a lot of data, so I can write a part after apr 21

pachadotdev avatar Apr 04 '23 17:04 pachadotdev

@wibeasley I have a draft from a workshop I attended. I will put this in a separate branch

pachadotdev avatar Jul 14 '23 18:07 pachadotdev

https://github.com/cran-task-views/WebTechnologies/tree/511

pachadotdev avatar Jul 14 '23 19:07 pachadotdev

@pachadotdev, I like it. I think it will be helpful to some audiences.

Are you writing it in a separate file, and later combining it into the Task View when you're satisfied?

I made converted it to semantic line breaks, which I've found helpful maintaining files that a lot of people touch. I also made a few changes that I hope you like. Reject anything you think doesn't improve the clarity.

wibeasley avatar Jul 15 '23 02:07 wibeasley

thanks! yes, I put that in a separate file

pachadotdev avatar Jul 15 '23 15:07 pachadotdev

Will it stay in a separate file, or be integrated into the Task View?

If it stays in a separate file, I think the Task View should link to the page you wrote.

wibeasley avatar Jul 15 '23 15:07 wibeasley

the idea should be to include it in the readme, once it's ready

pachadotdev avatar Jul 15 '23 17:07 pachadotdev

Thanks for putting this together, I think this is very useful!

However, this should be in the task view, not in the README. The README is just in the GitHub repository and the main page that readers will consult is the task view itself, typically on a CRAN mirror. So please put it into the task view itself when you think it is ready.

zeileis avatar Jul 15 '23 21:07 zeileis