HashSet<LocalName> for classes is slow
LocalName::new called for something that isn't in https://github.com/servo/html5ever/blob/master/markup5ever/local_names.txt locks a global Mutex. As LocalName::new is called for every class in Element::new, and most classes are unlikely to be in that list, this means that multithreading is much less of a win for HTML parsing than it should be. While it's a breaking change and probably not the best approach, I've switched to using Strings locally, which gives me about a 10% performance improvement. My program's performance is still dominated by HashSet construction, though. It might be faster to intern the entire HashSet<String>s per-document.
It would probably also be faster to use a different hasher for the HashSet. Here are some benchmarks: https://github.com/shepmaster/twox-hash#benchmarks
Or getting O(1) lookup through perfect hashing with https://crates.io/crates/phf
Or using a prefix tree (also called "trie", lookup in O(log n)), it seems there are many crates for this, e.g.:
https://crates.io/crates/trie-rs
https://crates.io/crates/tst
https://crates.io/crates/prefix-tree
https://crates.io/crates/qp-trie
https://crates.io/crates/radix_trie
These (pretty hacky) changes shave about 20% off my program's runtime:
diff --git a/Cargo.toml b/Cargo.toml
index 56112ea..221e445 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -23,6 +23,7 @@ matches = "0.1.6"
selectors = "0.22"
smallvec = "1"
tendril = "0.4"
+fnv = "1.0.7"
[dependencies.getopts]
version = "0.2.18"
diff --git a/src/node.rs b/src/node.rs
index 103fba8..bd255a9 100644
--- a/src/node.rs
+++ b/src/node.rs
@@ -1,7 +1,6 @@
//! HTML nodes.
use std::collections::{hash_map, hash_set};
-use std::collections::{HashMap, HashSet};
use std::fmt;
use std::ops::Deref;
@@ -10,6 +9,8 @@ use html5ever::{Attribute, LocalName, QualName};
use selectors::attr::CaseSensitivity;
+use fnv::{FnvHashSet, FnvHashMap};
+
/// An HTML node.
#[derive(Clone, PartialEq, Eq)]
pub enum Node {
@@ -231,13 +232,13 @@ pub struct Element {
pub name: QualName,
/// The element ID.
- pub id: Option<LocalName>,
+ pub id: Option<String>,
/// The element classes.
- pub classes: HashSet<LocalName>,
+ pub classes: FnvHashSet<String>,
/// The element attributes.
- pub attrs: HashMap<QualName, StrTendril>,
+ pub attrs: FnvHashMap<QualName, StrTendril>,
}
impl Element {
@@ -246,16 +247,17 @@ impl Element {
let id = attrs
.iter()
.find(|a| a.name.local.deref() == "id")
- .map(|a| LocalName::from(a.value.deref()));
+ .map(|a| String::from(a.value.deref()));
- let classes: HashSet<LocalName> = attrs
+ let classes: FnvHashSet<String> = attrs
.iter()
.find(|a| a.name.local.deref() == "class")
- .map_or(HashSet::new(), |a| {
+ .map_or(FnvHashSet::default(), |a| {
a.value
.deref()
- .split_whitespace()
- .map(LocalName::from)
+ .split(' ')
+ .filter(|x| !x.is_empty())
+ .map(String::from)
.collect()
});
@@ -308,7 +310,7 @@ impl Element {
#[allow(missing_debug_implementations)]
#[derive(Clone)]
pub struct Classes<'a> {
- inner: hash_set::Iter<'a, LocalName>,
+ inner: hash_set::Iter<'a, String>,
}
impl<'a> Iterator for Classes<'a> {
Most of the remaining time is taken by HashSet internals. A larger refactor would probably be necessary to get rid of that.
Now that I think about it, most programs bottlenecked by parsing will likely be parsing many HTML documents. I wonder if having a thread-local RefCell<HashMap<String, HashSet<String>>> where the HashSets are cloned while parsing would be a good idea? This would still incur allocation, but it would mean that every other part of HashSet construction would only occur a few times per execution.
Note: You can just write use fnv::{FnvHashMap as HashMap, FnvHashSet as HashSet}; as a drop-in replacement causing a minimal diff.
Now that I think about it, most programs bottlenecked by parsing will likely be parsing many HTML documents. I wonder if having a thread-local
RefCell<HashMap<String, HashSet<String>>>where theHashSets are cloned while parsing would be a good idea? This would still incur allocation, but it would mean that every other part ofHashSetconstruction would only occur a few times per execution.
I implemented this locally and it was only a very small improvement. Making a non-insignificant allocation for every element is very slow...