Replace BerkleyDB
The license that comes with the version of BerkleyDB that is currently being used (the Sleepycat license) is effectively viral and requires downstream users to comply with it if they are using OWB.
We should replace BDB with something that uses a more compatible license. E.g. ApacheDerby.
Given most of us generate CDX files from the CLI for small collections, maybe the out-of-the-box collection should just automate the process of generating and updating CDX files and use the CDX Collection?
I agree with Andy. If we would like to still support BDB, this could be made as a separate plugin. That way it would not influence licensing for the core distribution.
Since BDB is mostly used by new users, having it be a separate plug-in kind of defeats the purpose.
I agree. We should not officially maintain it as a plugin. But if someone out there wants to keep BDB support, they could maintain it as a separate plugin.
Had a look at where BDB is used. In addition to the index, BDB is used in proxy mode for storing session information. Proxy mode maps an session id (or IP if session id is missing) to a timestamp. This is used to keep track of the timestamp since URLs in proxy mode doesn't contain any.
I wonder if this could be replaced with standard servlet session handling. I see no reason for this mapping to be persistent across sessions. I don't know if there are any other reasons for servlet sessions not to work.
This sounds like exactly the kind of thing we should defer to the container for. Indeed, Tomcat already supports persistent sessions.
The main issue/difficulty with proxy mode is that session cookies (or any cookies) are not visible across domains. When you set a cookie on one domain in proxy mode, it won't be visible on any other domain. Ideally, you'd want a supercookie or Proxy-Cookie that can be set on one domain and visible across all domains, however, such a thing does not exist. There are various workarounds, one is using the IP. A better way is to indeed use a session cookie on a designated domain, then redirect to that domain if no cookie exists, then redirect again to a special cookie setting url, then redirect back to the original url with cookie. The main trick is to have a special domain prefix, like 'ABCDEF.wayback-session.example.com' which when visited, will set a session cookie for .example.com to ABCDEF, then redirect back to example.com
Okay, I think I understand that, but I don't understand whether that means we really need our own session persistence mechanism.
Am I right in thinking that this complexity just arises from the need to ensure the desired Accept-Datetime is maintained? i.e. if the client sent Accept-Datetime headers with it's request we wouldn't need to worry about this at all?
Depends on what you mean by 'session persistance'. The IP address thing wasn't used to persist session info across Tomcat restarts, but to identify a particular user because cookies could not be used. For the solution outlined above, the standard servlet session and cookie mechanisms would probably work.
Yes, using Accept-Datetime with every request is another alternative, since the client now provides the timestamp and not the server.
Though, the cookie/session mechanism doesn't require any plugins on the client, and could also be used to store current collection id and allow a user to switch to different collections w/o having to use one port per collection.
@ikreymer In the context of this issue 'session persistence' is persisting a session across server reboots. I don't see how that is relevant to the (very real but seemingly unrelated) issue of holding a session via cookies while in proxy mode.
@kris-sigur Yes, that's just my point, to reiterate, the BDB is used in order to persist user info by IP in proxy mode because cookies don't work (it looks for some kind of other session id, if present, which may be confusing things, but really it's the IP that is usually stored). It was not used or has anything to do with server reboots.
To put another way, removing this BDB will directly affect proxy mode operation, unless another alternative or implementation is used (eg. custom cookie, Accept-Datetime requirement, etc..)
I think we have two separate issues here. One is how to track which session a request belongs to since cookies doesn't work in proxy mode. The other is how to keep the session information on the server.
Since this issue is about BDB, I think we should open a separate issue on how to track the datetime for a client.
To the topic of this issue, I wonder if it is necessary to store session information in a persistent store like BDB. Are there any reasons for not storing this information only in memory? Of course that means that everything is lost between server restarts, but apart from that, are there any other concerns?
Somewhat relevant to this discussion, see issue #35
Hi, in addition to the things @johnerikhalse mentioned - BDB is also used in the BDB LocationDB.
@johnerikhalse Yes, I see your point, in the short term, it should be possible to replace the BDB with a global, threadsafe hashmap, keyed by IP like the BDB. This hashmap would work much like the BDB does, except of course the persistence on restart, but I don't think that was ever an important issue.