recipe-scrapers icon indicating copy to clipboard operation
recipe-scrapers copied to clipboard

Scraper Development Guide

Open strangetom opened this issue 1 year ago • 6 comments

Hi guys

One of the suggestions for improving the developer experience in #617 is writing some developer guidance documentation, it's also been mentioned in a few issues and PRs lately, so I though I would have a go at starting something.

I've come up with a rough outline of what the docs could cover:

  • A step by step guide to developing a new scraper. This would start from identifying a website, and cover generating the scraper and tests, adding functionality to the scraper, adding functionality to the test cases. This would be the main piece of documentation, and it would then link out to some more in depth articles to cover the following specific topics:
  • A more detailed definition of what the Scraper methods are and what they should return (in terms of datatypes and content) and which Scraper methods are 'mandatory' (e.g. title, ingredients, instructions ...) and which are more 'optional' (e.g. ingredient groups, ratings, reviews ...).
  • A more detailed guide on scraping from the html. I see this being a bit like a cookbook of common patterns and best practice.
  • A detailed guide for adding ingredient groups. This would effectively take the guidance I wrote in #799 and tidying it up.
  • A more detailed guide on debugging scraper during development.

A couple of questions I have:

  1. What format should this take? a. Github wiki? b. Markdown files in a docs folder? c. Sphinx (or similar) generated pages?
  2. Are there any topics people would like to see covered that I haven't mentioned above?

Progress

  • [x] Step by step guide for developing scraper (#862)
  • [x] Detailed guide: scraper functions (#862)
  • [x] Detailed guide: ingredient groups (#862)
  • [x] Detailed guide: HTML scraping (#862)
  • [ ] Detailed guide: debugging

Contributions for any of the current unwritten guides or any additional documentation is welcome.

strangetom avatar Sep 15 '23 14:09 strangetom

What format should this take?

I'd vote for markdown files within the repository, with a wiki as my second preference.

Reasoning: markdown is fairly straightforward and readable with or without supporting tooling, and GitHub previews it automatically, meaning that casual visitors to our repository could read it effectively too. It's also available while working with the code (whether in an IDE, online, or command-line), a benefit over the web-based wiki. Finally: some documentation changes are closely related to code changes, and the ability to include both in the same pull request / commit (when beneficial) could be useful.

jayaddison avatar Sep 15 '23 15:09 jayaddison

(also: thanks for getting this discussion going!)

jayaddison avatar Sep 15 '23 16:09 jayaddison

Thanks @jayaddison.

I'm glad you've voted for markdown files, as that was my preference too. I've created a draft PR #862 with a starting point and I'll continue adding to it as I get chance.

strangetom avatar Sep 16 '23 14:09 strangetom

I second the markdown files yep. I feel like mkdocs + material theme seems to be the pick nowadays in the python community. I'd vote for that specific combo with search plugin included. Sounds like a nice starting point.

hhursev avatar Sep 16 '23 14:09 hhursev

@strangetom maybe worth updating the issue description to use a Markdown checklist, and ticking off the items completed? (most of them :)) I'm thinking it might help some other contributor to see where they can help.

jayaddison avatar Oct 03 '23 07:10 jayaddison

Updated :)

strangetom avatar Oct 03 '23 18:10 strangetom