Fauxbar icon indicating copy to clipboard operation
Fauxbar copied to clipboard

Replace WebSQL

Open ChrisNZL opened this issue 5 years ago • 1 comments

WebSQL is destined to be deprecated and removed from Chromium at some point. Some notes are gathered here.

Possible replacements:

If possible, would be prudent to offer the ability to choose the database type, and migrate user's settings out of WebSQL before it's removed from Chromium.

ChrisNZL avatar Jun 14 '19 01:06 ChrisNZL

Experimenting with updating to Manifest V3 (issue #64).

WebSQL/SQLite is no longer feasible because:

  • Manifest V3 does not allow persistent background pages; only allows one service worker that is short-lived (stops and starts as required).

  • Fauxbar can't establish multiple connections to WebSQL in multiple tabs due to database I/O locking that was introduced in Chrome 75/76 in 2019 (issue #48). The solution was to use the sole background page's database connection, but this can no longer work since the service worker closes down when it wants to.

  • The service worker can't access the DOM, thus can't call openDatabase for WebSQL.

  • The fact that WebSQL is still slated for deprecation makes me not want to support it further.

Other "databases"? Probably not

I'm hesitant to use other database solutions, because service workers can't access IndexedDB nor localStorage.

Service workers can only access chrome.storage.


Proposal going forward

WebSQL was selected back in ~2011 due to its speed and efficiency for lookups and queries. A decade ago.

Doing some profiling with my own Fauxbar data, I currently have ~27,000 entries in Fauxbar's urls table (~4,000 are bookmarks).

I believe we shall use chrome.storage to store data for both Fauxbar's URLs and other settings/options.

No, there won't be indices and tables; we shall just forEach over URL data when querying. It's fast enough. Hear me out.

Profiling for speed

1. Convert table rows into an array

As an experiment, we export WebSQL's urls table into a urls entry into chrome.storage:

if (openDb()) {
	window.db.readTransaction(function(tx){
		tx.executeSql('SELECT * FROM urls', [], function(tx, results){
			var urls = [];
			for (var i = 0; i < results.rows.length; i++) {
				var result = results.rows.item(i);
				urls.push(result);
			}
			chrome.storage.local.set({"urls": urls});
		});
	});
}

We now have 27,496 URLs in an array:

chrome.storage.local.get("urls", function(urls){
	var totalUrls = Object.keys(urls.urls).length;
	console.log(totalUrls); // 27496
});

2. Querying the array

Normally, Fauxbar would build a SQL SELECT statement to query WebSQL. But, let's do a basic forEach over every item, and calculate the time it takes.

First, I envision every Fauxbar tab getting the URLs array ASAP silently, since this is the slowest operation, and just store it in the tab's window:

var time_start = Date.now();
chrome.storage.local.get("urls", function(urls){
	window.urls = urls.urls;
	var time_end = Date.now();
	var ms = time_end - time_start;
	console.log("Time took: "+ms+" ms");
});

Getting the array takes about 365 ms for me.

But once the array is stored in the window, it's fast to traverse over.

Let's make a function that takes the user's query, and loop through every entry in the array, and return an array of results.

window.GetMatchingURLs = function (query) {
	var time_start = Date.now();
	query = query.toLowerCase();
	var matches = [];
	window.urls.forEach(function(item){
		var url_lowercase = item.url.toLowerCase();
		if (url_lowercase.includes(query)) {
			matches.push(item);
		}
	});

	var time_end = Date.now();
	var ms = time_end - time_start;
	console.log("query: "+query+". Total matches: "+matches.length+". Time took: "+ms+" ms");
	return matches;
}

GetMatchingURLs("s");
GetMatchingURLs("stackoverflow");

And the log results:

query: s. Total matches: 27039. Time took: 10 ms
query: stackoverflow. Total matches: 146. Time took: 9 ms

(the time fluctuates by ~5 ms or so)

... but that's it. It's fast.

There will be the need to sort the matches by their frecency score before returning, so maybe a few more milliseconds to do that.

Conclusion (in theory – just need to code it)

Despite this sounding horrific, I think ditching WebSQL has to be done here. And basic array traversal is fast enough. The background service worker will handle chrome.history.onVisited events, and update data in chrome.storage.

I imagine the background service worker will be able to update the frecency scores on schedules with alarms.

I can see issues happening with chrome.storage setting multiple history visits in quick succession due to lack of transactions; can't rely on getting a huge array, manipulating it, and setting it again, since another worker process might also be manipulating the array at the same time.

Still, allowing the service worker to use an alarm to refresh the frecency scores now and then (which could also grab the whole history from Chrome for a full reindex), any "missing" history visits probably won't be a huge issue.

Because at this point, I can't see how a transactional database is going to be feasible and worth pursuing in Manifest V3. Even if the service worker stores history visits in a queue in chrome.storage with unique key names, then the extension imports the queued entries into something like IndexedDB when a Fauxbar tab is opened (thus having full DOM access – and assuming IndexedDB supports multiple connections in multiple tabs, unlike WebSQL), it's like, why? What optimisation does IndexedDB bring, when array traversal is fast enough? (since IndexedDB is basically just "NoSQL" anyway, just another key/value storage area like chrome.storage...? Plus extra syntax I'd have to figure out)

💡 With chrome.storage, we can get all the data via chrome.storage.local.get(null), so looping through each entry and checking against the name of the key can alleviate some clashing issues, and keys could have unique names when a history event is recorded ('queuedHistoryVisit_' + md5(url + timestamp + randomInt)); perhaps visits can be queued to be added by a single alarm where possible (in an attempt to prevent multiple workers manipulating the urls array). In any case, a reindex can rebuild everything if needed, so it's pretty much a non-issue for missing history visits with data overwriting. 💡

Further thoughts against IndexedDB

If using IndexedDB, since service workers can't access IndexedDB, any heavy frecency calculations would have to be done when a user opens a Fauxbar tab. And at that point, you'd be wanting the tab to feel performant, not wanting it to perform calculations to play "catch-up" with any queued history visits sitting in chrome.storage...

And as for indexing and fast lookups, which is what WebSQL/SQLite was good at... other solutions that use IndexedDB, to then build atop of for their own indexing... Is there much gain in speed performance? Because 15 ms to traverse over 27,000 array elements is fast enough, without any other dependencies.

I really think having service workers manipulate data within chrome.storage has to be done in the background where possible, even if it's just basic key/value traversal. Not having a persistent background page is quite the game-changer (also means Fauxbar Memory Helper can be removed), but alas. And I still don't see the benefit of using IndexedDB since it's just a key/value storage area like chrome.storage.

ChrisNZL avatar Jan 19 '22 02:01 ChrisNZL