Anders Klindt Myrvoll
Anders Klindt Myrvoll
> This doesn't appear to be a bug. The page loads 5 more articles from the server when the user clicks the `SE FLERE` button. As we are looking at...
Great. I tried to use archiveweb.page extra crawls to fix crawls on our local installations and it didn´t repair it the same way as on Cloud. But it might also...
Yes. But they could be some differences - but basically I just crawled from the frontpage and did a lot of "SE FLERE"-clicks as add-ons in archiveweb.page. Seems it works...
The original idea was to just exclude the single URL that you clicked - so it was very intuitive/quick. It could also be "Edit">select item(s)/seeds>delete. A bit like choosing multiple...
https://watch.screencastify.com/v/kvy9xR7AUrePlRfxVyJf Maybe a partly roll-out?
For our normal use case, ingesting files to the big web archive, the UKWA WARC-indexer will take care of it. For manual indexing WARC-files we´ll just rename the files as...
The same issue still applies. I tried to complement Browsertrix Cloud-collections with downloaded/then uploaded archiveweb.page crawls and will get files named data.warc instead of original WARC-names or files containing parts...
I have probably done both. The last uploaded file was a lowly single WACZ-file ~50MB
Right now it seems like v1.14.2-cb52da6 fixed this bug....
But not anymore.. on version: v1.14.3-9466e83 Trying to get logged in content and double check that it works by viewing "Configure Crawling Profile" i get this during crawl and playback:...