Feature Request: add unique cookie prefix or allow setting
What type of suggestion are you making?
Modification of existing behavior
What is the problem that your feature request solves?
I am running this along with a few dozen other containers on a small home server with podman under one IP (using different port numbers for each service). It works pretty well aside from one issue: session cookie names often conflict causing frequent log outs.
archivebox uses generic names "sessionid" and "csrftoken".
What is your proposed solution?
Would it be possible to append a unique prefix to the session ID cookie name (ie: 'archivebox_sessionid') or allow us to add a prefix using an environment variable?
What hacks or alternative solutions have you tried to solve the problem?
Searched around documentation trying to find a way to set a prefix for the session cookies and could not find anything. Nor could I find any suggestions pertaining to this issue.
Share the entire output of the archivebox version command for the current verison you are using.
0.7.3
ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.11.0-18-generic-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=False FS_USER=0:0 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False
[i] Dependency versions:
√ PYTHON_BINARY v3.11.11 valid /usr/local/bin/python3.11
√ SQLITE_BINARY v2.6.0 valid /usr/local/lib/python3.11/sqlite3/dbapi2.py
√ DJANGO_BINARY v3.1.14 valid /usr/local/lib/python3.11/site-packages/django/__init__.py
√ ARCHIVEBOX_BINARY v0.7.3 valid /usr/local/bin/archivebox
√ CURL_BINARY v8.10.1 valid /usr/bin/curl
√ WGET_BINARY v1.21.3 valid /usr/bin/wget
√ NODE_BINARY v20.18.1 valid /usr/bin/node
√ SINGLEFILE_BINARY v1.1.54 valid ./node_modules/single-file-cli/single-file
√ READABILITY_BINARY v0.0.11 valid ./node_modules/readability-extractor/readability-extractor
√ MERCURY_BINARY v1.0.0 valid ./node_modules/@postlight/parser/cli.js
√ GIT_BINARY v2.39.5 valid /usr/bin/git
√ YOUTUBEDL_BINARY v2024.12.13 valid /usr/local/bin/yt-dlp
√ CHROME_BINARY v131.0.6778.33 valid /usr/bin/chromium-browser
√ RIPGREP_BINARY v13.0.0 valid /usr/bin/rg
[i] Source-code locations:
√ PACKAGE_DIR 24 files valid ./archivebox
√ TEMPLATES_DIR 3 files valid ./archivebox/templates
- CUSTOM_TEMPLATES_DIR - disabled None
[i] Secrets locations:
- CHROME_USER_DATA_DIR - disabled None
- COOKIES_FILE - disabled None
How badly do you want this new feature?
- [ ] It's an urgent deal-breaker, I can't live without it
- [ ] It's important to add it in the near-mid term future
- [x] It would be nice to have eventually
- [ ] I'm willing to start a PR to develop this myself
- [ ] I have donated money to go towards fixing this issue
Mini Survey
- [x] I like ArchiveBox so far / would recommend it to a friend
- [ ] I've had a lot of difficulty getting ArchiveBox set up
- [ ] I would pay $10/mo for a hosted version of ArchiveBox if it had this feature
Interesting. I thought different ports were considered different origins, very surprised your browser is re-using cookies across ports.
If archivebox is sharing cookies with other things on the same server that is VERY BAD. It means archived JS potentially has access to any other service you're hosting. All it would take is archiving one malicious page, then you viewing the wget output would allow an attacker to login as you on those other services and hack your accounts.
Yes I just confirmed all ArchiveBox cookies are set in HostOnly mode which means they are not exposed to any other host:port combinations other than the exact one they were set with.
This means your other services are the ones setting cookies without HostOnly, which is a potential security risk that those services should fix. If we changed our cookies names it would fix the glitches you're seeing but it would hide the real security issue more, so I'm on the fence about it. In general ArchiveBox is not really safe to host on a shared domain with anything else because it contains a ton of untrusted HTML, JS, CSS, cookies, etc., so I strongly discourage it, you should really set up ingress on a unique domain specific to archivebox using something like traefik or cloudflare tunnels.
That's still checked for me for every cookie (archivebox and others) for the address. I thought it would cover the port?
That said - it's a good point about hosting on a shared deal. The web panel isn't exposed to the internet anyways and I only use it to archive select reddit comments. However, I still can't account for unsafe code as you pointed out so I'll move it to it's own IP and instance sometime today anyways as a precaution.
edit: I found this which may be relevant https://stackoverflow.com/questions/1612177/are-http-cookies-port-specific
In my case, I'll just move archivebox to it's own IP on my network. But I would still like to make this request as I feel it would aid in security even if it's a small way.