crocoite icon indicating copy to clipboard operation
crocoite copied to clipboard

Replace warcio

Open PromyLOPh opened this issue 7 years ago • 2 comments

The API is not exactly pretty and it’s easy to mess things up. There are no plausibility checks and no validation. We want:

  • A nice/clean API that separates WARC and its payloads. warcio mixes WARC/HTTP
  • Relaxed parsing (read broken files)
  • Strict validation based on specs before writing (writing records violating the specs should not be possible)
  • ~~read(write()) should be identity function (easier testing)~~ (see https://github.com/webrecorder/warcio/issues/57)

PromyLOPh avatar Dec 02 '18 16:12 PromyLOPh

Hi, just wanted to say, as the creator of warcio that we'd definitely welcome contributions and improvements to the library. warcio has evolved and was refactored from a larger library (pywb), and some components were added later. It was originally optimized for stream-based reading (and later writing). It is certainly not perfect, and we would definitely welcome suggestions for improvements. I would say we share many of the goals that you've mentioned. Reading and fixing partially broken files is also a key goal, and there is currently a PR to add more (optional) validation. The library has evolved over the years to meet the specific needs but of course our resources are also limited. If you have suggestions for improvements or would like to submit PRs, we would be happy to listen.

ikreymer avatar Dec 03 '18 23:12 ikreymer

You’re right, we share a common goal and I should be reporting stuff like the last thing on my list (which I just did). I’m very glad that I can just use warcio right now and I have (almost) no issues regarding its functionality or reliability. I’m not sure how to approach “fixing” my main issue, the API, though. It seems like it would require major refactoring. And I don’t have the resources to do that or (as the title indicates) even to replace warcio. So, consider the list above a personal wishlist of mine.

PromyLOPh avatar Dec 04 '18 16:12 PromyLOPh