devdocs icon indicating copy to clipboard operation
devdocs copied to clipboard

Update Drupal UrlScraper

Open TravisCarden opened this issue 3 years ago • 4 comments

This updates Drupal 7 and 8 and adds Drupal 9 and 10.

  • [x] Updated the versions and releases in the scraper file
  • [x] Ensured the license is up-to-date and that the documentation's entry in the array in about_tmpl.coffee matches its data in self.attribution
  • [x] Ensured the icons and the SOURCE file in public/icons/your_scraper_name/ are up-to-date if the documentation has a custom icon
  • [x] Ensured self.links contains up-to-date urls if self.links is defined
  • [ ] Tested the changes locally to ensure:
    • The scraper still works without errors
    • The scraped documentation still looks consistent with the rest of DevDocs
    • The categorization of entries is still good

TravisCarden avatar Jul 29 '22 20:07 TravisCarden

I'm trying to test locally, and I'm getting the error below--before my changes and after them. Can anyone help me debug it?

thor docs:generate "Drupal@7" --debug
/!\ WARNING /!\

Some scrapers send thousands of HTTP requests in a short period of time,
which can slow down the source site and trouble its maintainers.

Please scrape responsibly. Don't do it unless you're modifying the code.

To download the latest tested version of this documentation, run:
  thor docs:download Drupal@7

Proceed? (y/n) y
Queue:   api.drupal.org/api/drupal/7.x
Queue:   api.drupal.org/api/drupal/groups/7.x
Queue:   api.drupal.org/api/drupal/groups/7.x?page=1
ERROR:
  https://api.drupal.org/api/drupal/7.x
  RuntimeError: Error status code (0): URL using bad/illegal format or missing URL
    https://api.drupal.org/api/drupal/7.x



  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:49:in `process_response?'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:158:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:77:in `block in build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:59:in `block (2 levels) in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `each'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `block in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/instrumentable.rb:15:in `instrument'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:57:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:18:in `run'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:38:in `request_all'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:76:in `build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:115:in `block in store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block (2 levels) in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:182:in `track_touched'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:170:in `lock'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:144:in `open_yield_close'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:30:in `open'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:114:in `store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs.rb:100:in `generate'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:303:in `generate_doc'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:105:in `generate'

ERROR:
  https://api.drupal.org/api/drupal/groups/7.x
  RuntimeError: Error status code (0): URL using bad/illegal format or missing URL
    https://api.drupal.org/api/drupal/groups/7.x



  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:49:in `process_response?'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:158:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:77:in `block in build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:59:in `block (2 levels) in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `each'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `block in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/instrumentable.rb:15:in `instrument'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:57:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:18:in `run'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:38:in `request_all'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:76:in `build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:115:in `block in store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block (2 levels) in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:182:in `track_touched'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:170:in `lock'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:144:in `open_yield_close'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:30:in `open'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:114:in `store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs.rb:100:in `generate'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:303:in `generate_doc'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:105:in `generate'

ERROR:
  https://api.drupal.org/api/drupal/groups/7.x?page=1
  RuntimeError: Error status code (0): URL using bad/illegal format or missing URL
    https://api.drupal.org/api/drupal/groups/7.x?page=1



  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:49:in `process_response?'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:158:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:77:in `block in build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:59:in `block (2 levels) in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `each'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:58:in `block in handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/instrumentable.rb:15:in `instrument'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:57:in `handle_response'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/requester.rb:18:in `run'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scrapers/url_scraper.rb:38:in `request_all'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/scraper.rb:76:in `build_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:115:in `block in store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block (2 levels) in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:182:in `track_touched'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:170:in `lock'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:87:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `block in replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:144:in `open_yield_close'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:30:in `open'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/storage/abstract_store.rb:85:in `replace'
  /Users/traviscarden/Projects/other/devdocs/lib/docs/core/doc.rb:114:in `store_pages'
  /Users/traviscarden/Projects/other/devdocs/lib/docs.rb:100:in `generate'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:303:in `generate_doc'
  /Users/traviscarden/Projects/other/devdocs/lib/tasks/docs.thor:105:in `generate'

Failed!

TravisCarden avatar Jul 29 '22 21:07 TravisCarden

It seems that the structure of the Drupal docs has been changed. Please revise item 4 from https://github.com/freeCodeCamp/devdocs/blob/main/docs/adding-docs.md

The following patch allowed me to scrape v7 docs:

diff --git a/lib/docs/filters/drupal/entries.rb b/lib/docs/filters/drupal/entries.rb
index 9da70441..b0c99d91 100644
--- a/lib/docs/filters/drupal/entries.rb
+++ b/lib/docs/filters/drupal/entries.rb
@@ -20,7 +20,7 @@ module Docs
         elsif subpath =~ /core!themes/
           'themes'
         else
-          css('.breadcrumb > a')[1].content
+          css('.breadcrumb a')[1].content
         end
       end
 
diff --git a/lib/docs/scrapers/drupal.rb b/lib/docs/scrapers/drupal.rb
index 3798caec..96cca5e9 100644
--- a/lib/docs/scrapers/drupal.rb
+++ b/lib/docs/scrapers/drupal.rb
@@ -10,7 +10,7 @@ module Docs
     html_filters.push 'drupal/entries', 'drupal/clean_html', 'title'
 
     options[:decode_and_clean_paths] = true
-    options[:container] = '#page-inner'
+    options[:container] = '#page'
     options[:title] = false
     options[:root_title] = 'Drupal'

simon04 avatar Jul 31 '22 18:07 simon04

Thank you, @simon04, I've applied your patch. I still get the same runtime error, though, running thor docs:generate "Drupal@7". Since it works for you, I assume it's something about my local setup. Should I/we try to debug it? Or if it works for you, is that good enough to move forward?

TravisCarden avatar Aug 01 '22 15:08 TravisCarden

I'm going to go out on a limb and mark this ready for review, unanswered question notwithstanding. 🙂

TravisCarden avatar Aug 30 '22 22:08 TravisCarden