ph-submissions icon indicating copy to clipboard operation
ph-submissions copied to clipboard

Scraping the UK Web Archive with Boilerpipe

Open lizfischer opened this issue 1 year ago • 4 comments

The Programming Historian has received the following tutorial on 'Scraping the UK Web Archive with Boilerpipe' by Caio Mello @caiocmello and Martin Steer @martysteer. This lesson is now under review and can be read at:

http://programminghistorian.github.io/ph-submissions/en/drafts/originals/scraping-the-uk-web-archive-with-boilerpipe

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I have already read through the lesson and provided feedback, to which the author has responded.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me.

Our dedicated Ombudsperson is (Ian Milligan - http://programminghistorian.org/en/project-team). Please feel free to contact him at any time if you have concerns that you would like addressed by an impartial observer. Contacting the ombudsperson will have no impact on the outcome of any peer review.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. Thank you for helping us to create a safe space.


lizfischer avatar Oct 07 '22 17:10 lizfischer

Hello @lizfischer. Just to let you know that I've sent @martysteer an invitation to join ph-submissions as an outside collaborator for the duration of this lesson's review. (@caiocmello already has write access because they're working on other lessons at the moment).

This means that both authors will be able to make direct edits to this lesson within our work-in-progress repo, without using the PR system 🙂

anisa-hawes avatar Oct 07 '22 18:10 anisa-hawes

Hello again @lizfischer,

I've made a couple of adjustments to the YAML header, and added in the liquid syntax we require to display images on our site (example): {% include figure.html filename="file-name.png" alt="Visual description of figure image" caption="Caption text to display" %}. I've plotted in the minimum, and we can return to add descriptive alt-text during the review process.

I've also made a couple of small typesetting tweaks, and removed the author bios + suggested citation (these are generated automatically).

I'll paste the author bios below (and Alex will slot them into ph_authors.yml when we reach publication):

- name: Caio Mello
  team: false
  orcid: 0000-0000-1111-1111
  bio:
      en: |
          Caio Mello is a PhD student in Digital Humanities at the School of Advanced Study, University of London. His main research interests lie in the field of digital methods, Natural Language Processing techniques, data visualisation, media studies, urban studies and digital activism.
- name: Martin Steer
  team: false
  orcid: 0000-0000-1111-1111
  bio:
      en: |
          Martin Steer is Technical Lead in the Digital Humanities Research Hub at the School of Advanced Study, University of London. He works with humanities and social media data, data infrastructure, web archives and fiction.

A live preview of the lesson is now available: http://programminghistorian.github.io/ph-submissions/en/drafts/originals/scraping-the-uk-web-archive-with-boilerpipe

anisa-hawes avatar Oct 07 '22 18:10 anisa-hawes

Thank you @caiocmello. I actually made a small change to Liz's initial comment, as we've recently replaced the in-thread Permission to Publish statement with an authorial copyright declaration form. So that step of our workflow has changed.

At the end of the review, when I've copyedited the lesson and made any final typesetting adjustments, I'll reach out and ask you to fill in the form which clarifies that you retain unrestricted copyright of your work, while granting us first permission to publish it under a CC-BY 4.0 License.

anisa-hawes avatar Oct 08 '22 06:10 anisa-hawes

Thank you, @caiocmello and @martysteer. Our workflow has recently changed, to replace these Permission to Publish statements with a more formalised authorial copyright declaration form. At the very end of the workflow, when I've copyedited the lesson and made any final typesetting adjustments, I'll reach out and ask you to fill in the form.

anisa-hawes avatar Oct 12 '22 13:10 anisa-hawes

Hi @anisa-hawes. Thanks very much. Is there any update on this lesson? Is there anything we (authors) have to do at this stage? Best wishes, Caio

caiocmello avatar Dec 13 '22 14:12 caiocmello

Thank you for getting in touch, @caiocmello. There's nothing further we need from you until the reviews are complete.

Hello @lizfischer and @hawc2, Are you able to post an update to this Issue? It would be great to hear if the reviews are underway.

anisa-hawes avatar Dec 14 '22 14:12 anisa-hawes

Hi @caiocmello, so sorry for the delay! I am working on getting reviewers currently-- it may take a little while for me to hear back from folks given the time of year. I will update here as soon as those are confirmed.

lizfischer avatar Dec 14 '22 17:12 lizfischer

Hi folks, just a quick update-- I'm still on the hunt for reviewers! Hopefully we will have more luck in the post-holiday season :)

lizfischer avatar Jan 13 '23 16:01 lizfischer

Hello @lizfischer is there any updates on the reviewing process?

caiocmello avatar Apr 03 '23 14:04 caiocmello

hi @caiocmello, unfortunately Liz has had to step away from editing this lesson, so I'm currently in the process of finding you a new editor. I believe the first set of reviewers Liz reached out to did not respond, so we will have to try a new set. If you have any recommendations, let me know, and please feel free to email me at [email protected] if you have additional questions while we identify you a new editor from the English team

hawc2 avatar Apr 04 '23 18:04 hawc2

Hi @hawc2, thanks very much for your answer. I am going to email you with two recommendations of editors soon.

caiocmello avatar Apr 11 '23 08:04 caiocmello

Hi @caiocmello, I haven't heard from you since an email I sent a few months ago noting some concerns about the sustainability of this lesson with such an outdated library. Programming Historian in English is preparing to reorganize how we accept submissions, and I would like to invite you to resubmit a revised version of this proposal when we do a call for papers in September. The lesson will need to change from the version here to address some of those concerns, and we can aim to more adequately review this proposal and edit it in a timely fashion next year. Please feel free to email me with any remaining questions about this proposal, but for now I am going to close this issue ticket. Thanks for your consideration, and apologies for the winding road this particular proposal has taken. I hope we can still find a way to publish a version of this more attuned to our current and future needs as a journal.

hawc2 avatar Jul 24 '23 20:07 hawc2