city-scrapers
city-scrapers copied to clipboard
how to handle events that require registration?
@mwgalloway raised this issue. @diaholliday might have some thoughts about how to structure this as the open civic data specification doesn't account for it
Have we come across an example that can be posted here?
@diaholliday CPS Board of Education is the page I'm building a scraper for that brought up the question. Below the meeting schedule there are registration instructions.
I think that particular registration may be for speakers (not general attendance) but I'd like to log it in either case. Can you mark it down in the notes section of the excel sheet? If we see it pop up in more events I may want to create a new column on the excel sheet to keep track of them (and eventually include a button/box on the calendar).
Is it possible to utilize the HttpAuthMiddleware class in scrapy? You can authenticate with an http username and password.
https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware
Alternatively, BeautifulSoup also supports http auth functionality and is compatible with scrapy.
https://doc.scrapy.org/en/latest/faq.html?highlight=beautifulsoup#faq-scrapy-bs-cmp
On Tue, Aug 8, 2017, 7:51 PM diaholliday [email protected] wrote:
I think that particular registration may be for speakers (not general attendance) but I'd like to log it in either case. Can you mark it down in the notes section of the excel sheet? If we see it pop up in more events I may want to create a new column on the excel sheet to keep track of them (and eventually include a button/box on the calendar).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/City-Bureau/documenters-aggregator/issues/38#issuecomment-321121429, or mute the thread https://github.com/notifications/unsubscribe-auth/ATuvEyRIJmLJEM0Qb_A7LOuJHFe-ZBVuks5sWQJ7gaJpZM4OxdEY .
--
https://about.me/csethna?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=edit_panel Cyrus Sethna about.me/csethna https://about.me/csethna?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=edit_panel
@csethna in this case, this isn't registration to scrape the site, but instead events that require some form of registration to attend. Seems relevant to Documenters and reporters as the Board of Ed site says "Advance registration is available for speakers and observers."
My instinct is to extend the open civic data format (and suggest they add this as an optional field) with a boolean registration required field. But registration almost always involves some strange and curious process, so we need a place to put more detailed/human-readable registration instructions. That could just be part of the description field or something separate.
How about we extend the OCD model with a new object, registration
:
{
'_type': 'event',
# other fields
'registration': {
'required': True,
'registration_url': 'https://domain.test/events/register',
'info': 'Info about registration requirements',
'info_url': 'https://domain.test/how_to_register_for_events',
},
}
For events that don't need registration we could leave out most of the new fields:
{
'_type': 'event',
# other fields
'registration': {
'required': False,
},
}
@eads This has come up with the ward night scraper as well. What do you think of adding fields as described above?