rgaudin comments

Results 844 comments of


                                            rgaudin

Make CI work even for external contributors

Of course it is. That's why every run that involve secrets have to be manually authorized by someone (a lieutenant in apple's case) because it's strictly not possible to provide...

Make CI work even for external contributors

It was implemented last year only. Apple is different to scrapers in that secrets are very sensitive (compared to say CODECOV_TOKEN) and we had frequent external contributors. I don't know...

Mandatory metadata are not all set

Yes that's a possibility ; I was only suggesting that when we figure out there's a missing title for instance, we fail with a clear message.

Mandatory metadata are not all set

> I still consider failing a scrape which might have taken hours or even days of crawling just because there is a missing title or description is quite disappointing for...

networkidle is no longer a valid waitUntil

Thank you @brandonocasey ; @benoit74 our [default option in zimit](https://github.com/openzim/zimit/blob/main/zimit.py#L119) still references `networkidle`. With the latest browsertrix-crawler updates, there must have been a pupeeter upgrade. We need to find out...

networkidle is no longer a valid waitUntil

Hum, I don't know. It's not just a check but it's self-documentation as well. We can revisit where we set our cursor of _required-browsertrix-knowledge_ for Zimit users. At first, I...

networkidle is no longer a valid waitUntil

If you don't know what values to input for the field, it's not much useful. I really think including crawler's help into ours would help **a lot**. We can then...

Consider officially supporting the Dillo browser

@kelson42 the thing is that the `/nojs` endpoint links to the `/content` endpoint which serves ZIM articles _raw_. Maybe we should add a new nojs-friendly wrapper (still using an iframe)...

Impossible to ZIM files from upload.wikimedia.org

Indeed ; shouldn't scraperlib do this by default for stream_file?

Keeping track of dated events

Thank you @benoit74 for highlighting the exact issue. > [@rgaudin](https://github.com/rgaudin) I would like to keep any IT operation free of any Google Workspace dependency.AFAP I dont see how I fit...