Andy Jackson
                                            Andy Jackson
                                        
                                    The Parquet format does not directly support appending row groups, but `fastparquet` seems to manage it by patching/edititing the end of the file before appending another row group. See https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write...
I might be misunderstanding how DROID works, but I think scanning the whole thing is the expected behaviour for ZIP files, and has been for some time. I wrote up...
Thanks. The first one looks like it should now be https://mementoweb.github.io/SiteStory/ - not checked the others yet. See also #98 which proposed archiving the Tempas links.
See also https://app.browsertrix.com/orgs/wac2024-workshop/items/crawl/manual-20240425222929-55b57b82-a92/review/screenshots?qaRunId=qa-20240425230548-55b57b82-a92&itemPageId=0a54db6a-0ff6-45a3-a1cd-3f583a4e5ae7 and note that the 'dual slider view' doesn't cope correctly in this situation.
Ah, interesting, thank you. When I switch off the sitemap option, the homepage crawls and renders. Pretty good too - only missing the embedded video. EDIT: Would videos embedded like...
We also have redirects that point back to the web archive, that PyWB is unable to deal with (webrecorder/pywb#591) - it would be great to be able to filter our...
Ah, oops. That's just us hardcoding our primary use case. Perfectly fine to make this gettable/settable, but it might be a little while before I get chance to change it...
Hm, looking back over this, and it's a bit more complicated than that. I've set up Droid as a Apache Tika 'detector' module, and the API is build with the...
Hi @tokee note that DROID/PRONOM rely on `HasPriorityOverFileFormatID` relationships between signatures to ensure correct matches (rather than using a global priority value as Tika does). To ensure correct results, the...
This has been partially implemented, as PRONOM-related results have been added as Metadata, but it still returns the combined MIME type (as that's how I'm using it in webarchive-discovery for...