pldb icon indicating copy to clipboard operation
pldb copied to clipboard

another possible data source

Open breck7 opened this issue 3 years ago • 2 comments

https://glosario.carpentries.org/ https://twitter.com/gvwilson/status/1566904419857440768

breck7 avatar Sep 05 '22 21:09 breck7

Hello! could you please put a little more detail about what is needed in this issue?

Thank you very much!

adriantintpilver avatar Sep 29 '22 18:09 adriantintpilver

Great question @adriantintpilver !

My general approach to adding a data source is like this:

  1. Stumble upon an interesting data source, perhaps a website called https://worlds-best-best-programming-books.xyz
  2. Start manually adding lines to *.pldb files like worldsBestProgrammingBooks php 432 books to get a "feel" for how best to extract the most useful data from that source and add it to our database. Usually I start small and add complexity as we go.
  3. Once I have a "feel" for the new data source, and have "linked" about 5-10 files, I will create a single grammar file for the new data source (here is a real one from pldb.com, for example: https://github.com/breck7/pldb/blob/main/database/grammar/helloWorldCollection.grammar)
  4. I will commit that grammar file and those initial manual entries
  5. Then I will either just make some tea and add the rest of the data manually, or write a crawler script (https://github.com/breck7/pldb/tree/main/code/crawlers) to programmatically import and keep updated that data source.

That's pretty much it!

I know the docs are still pretty sparse, and especially around the grammar language not too many docs, but hopefully that helps a little bit?

breck7 avatar Sep 29 '22 18:09 breck7