ArchiveBox icon indicating copy to clipboard operation
ArchiveBox copied to clipboard

Feature Request: add unique cookie prefix or allow setting

Open gen-angry opened this issue 10 months ago • 3 comments

What type of suggestion are you making?

Modification of existing behavior

What is the problem that your feature request solves?

I am running this along with a few dozen other containers on a small home server with podman under one IP (using different port numbers for each service). It works pretty well aside from one issue: session cookie names often conflict causing frequent log outs.

archivebox uses generic names "sessionid" and "csrftoken".

What is your proposed solution?

Would it be possible to append a unique prefix to the session ID cookie name (ie: 'archivebox_sessionid') or allow us to add a prefix using an environment variable?

What hacks or alternative solutions have you tried to solve the problem?

Searched around documentation trying to find a way to set a prefix for the session cookies and could not find anything. Nor could I find any suggestions pertaining to this issue.

Share the entire output of the archivebox version command for the current verison you are using.

0.7.3
ArchiveBox v0.7.3 COMMIT_HASH=069aabc BUILD_TIME=2024-12-15 09:54:03 1734256443
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-6.11.0-18-generic-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=False FS_USER=0:0 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.11        valid     /usr/local/bin/python3.11                                                   
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py                                 
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py                  
 √  ARCHIVEBOX_BINARY     v0.7.3          valid     /usr/local/bin/archivebox                                                   

 √  CURL_BINARY           v8.10.1         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v20.18.1        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.54         valid     ./node_modules/single-file-cli/single-file                                  
 √  READABILITY_BINARY    v0.0.11         valid     ./node_modules/readability-extractor/readability-extractor                  
 √  MERCURY_BINARY        v1.0.0          valid     ./node_modules/@postlight/parser/cli.js                                     
 √  GIT_BINARY            v2.39.5         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2024.12.13     valid     /usr/local/bin/yt-dlp                                                       
 √  CHROME_BINARY         v131.0.6778.33  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           24 files        valid     ./archivebox                                                                
 √  TEMPLATES_DIR         3 files         valid     ./archivebox/templates                                                      
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                                        
 -  COOKIES_FILE          -               disabled  None

How badly do you want this new feature?

  • [ ] It's an urgent deal-breaker, I can't live without it
  • [ ] It's important to add it in the near-mid term future
  • [x] It would be nice to have eventually
  • [ ] I'm willing to start a PR to develop this myself
  • [ ] I have donated money to go towards fixing this issue

Mini Survey

  • [x] I like ArchiveBox so far / would recommend it to a friend
  • [ ] I've had a lot of difficulty getting ArchiveBox set up
  • [ ] I would pay $10/mo for a hosted version of ArchiveBox if it had this feature

gen-angry avatar Feb 22 '25 16:02 gen-angry

Interesting. I thought different ports were considered different origins, very surprised your browser is re-using cookies across ports.

If archivebox is sharing cookies with other things on the same server that is VERY BAD. It means archived JS potentially has access to any other service you're hosting. All it would take is archiving one malicious page, then you viewing the wget output would allow an attacker to login as you on those other services and hack your accounts.

pirate avatar Feb 22 '25 17:02 pirate

Yes I just confirmed all ArchiveBox cookies are set in HostOnly mode which means they are not exposed to any other host:port combinations other than the exact one they were set with.

Image

This means your other services are the ones setting cookies without HostOnly, which is a potential security risk that those services should fix. If we changed our cookies names it would fix the glitches you're seeing but it would hide the real security issue more, so I'm on the fence about it. In general ArchiveBox is not really safe to host on a shared domain with anything else because it contains a ton of untrusted HTML, JS, CSS, cookies, etc., so I strongly discourage it, you should really set up ingress on a unique domain specific to archivebox using something like traefik or cloudflare tunnels.

pirate avatar Feb 22 '25 18:02 pirate

That's still checked for me for every cookie (archivebox and others) for the address. I thought it would cover the port?

That said - it's a good point about hosting on a shared deal. The web panel isn't exposed to the internet anyways and I only use it to archive select reddit comments. However, I still can't account for unsafe code as you pointed out so I'll move it to it's own IP and instance sometime today anyways as a precaution.

edit: I found this which may be relevant https://stackoverflow.com/questions/1612177/are-http-cookies-port-specific

In my case, I'll just move archivebox to it's own IP on my network. But I would still like to make this request as I feel it would aid in security even if it's a small way.

gen-angry avatar Feb 22 '25 19:02 gen-angry