Jake L
Jake L
We think it would be useful to have the following stats - [ ] Total requests recorded - [ ] Number of non-matching successful Doppelganger requests - [ ] Add...
https://github.com/internetarchive/Zeno/actions/runs/17658909934/job/50188173120 ``` ================== WARNING: DATA RACE Write at 0x00c00004ccd0 by goroutine 108: github.com/internetarchive/Zeno/internal/pkg/archiver/headless.archivePage() /home/runner/work/Zeno/Zeno/internal/pkg/archiver/headless/archiver.go:339 +0x18a6 github.com/internetarchive/Zeno/internal/pkg/archiver/headless.ArchiveItem() /home/runner/work/Zeno/Zeno/internal/pkg/archiver/headless/archiver.go:100 +0x674 github.com/internetarchive/Zeno/internal/pkg/archiver.archive.gowrap1() /home/runner/work/Zeno/Zeno/internal/pkg/archiver/worker.go:230 +0x6b Previous write at 0x00c00004ccd0 by goroutine 138: github.com/internetarchive/Zeno/internal/pkg/archiver/headless.archivePage.func2() /home/runner/work/Zeno/Zeno/internal/pkg/archiver/headless/archiver.go:235...
As discussed in #346 , outlinks projects are a way to preserve outlinks while not immediately crawling them (in the same context, anyways).
With #420 we are no longer adding every outlink we find. In addition to this feature, we want to add built in prioritization to ensure more important URLs are prioritized....
`URL.GetMIMEType()` in `IsXML` appears to be set to `text/xml; charset=utf-8` on `http://laborculture.org`. This is incorrect based on the headers and content. ``` time=2025-05-21T18:39:48.614-04:00 level=INFO msg="url archived" worker_id=0 component=archiver.archive url=http://laborculture.org/ seed_id=03b36...
Allow Zeno to switch to a secondary HQ project when the main project is empty.
We currently believe a number of URLs that are completed in HQ are not "finished" in HQ. This is likely to do with some bug in the finisher code in...