Daniel
Daniel
I like the idea of having it pulled directly from github. Using github pages, it could even access the last modified time and only update when changes had been made....
In my case, having a custom user agent would have prevented the bot from being blacklisted. Being the default `Go-http-client/1.1` and `Go-http-client/2.0` was flagged as someone scrapping the site. Blocking...
This could not have been done without @TokugawaHeavyIndustries telling me about the archive.org binary
Thanks for the info @petergeneric - I wasn't aware of the issues with archive.org
Thanks for the info @petergeneric - I was trying to run it in docker too - so that added a couple frustrations on its own.
I just ran into this issue with whitespace. It didn't cause a security issue, but rather a bunch of unexpected errors: ``` >>> urlparse(' https://example.com ') ParseResult(scheme='', netloc='', path=' https://example.com...
Hmmm... interesting, what about the case without a scheme with port? ``` >>> urlparse('www.example.com:80') ParseResult(scheme='www.example.com', netloc='', path='80', params='', query='', fragment='') ```
If you ever have issues with one of my lists, please report it directly to me so that we can have a discussions about why I blocked it and if...
`@minColumns` and `@maxColumns` from #17 would be useful additions to this enhancement
The first sentence describing CSV Schema on https://digital-preservation.github.io/csv-schema > A text based schema language (CSV Schema) for describing data in CSV files for the purposes of validation. I would argue...