hdt
hdt copied to clipboard
add web assembly target?
Would anyone need this?
Investigate whether that would be possible with reasonable efforts.
cargo build --target wasm32-unknown-unknown does not work
error: could not compile `regex-syntax` (lib) due to 990 previous errors
Yes, this would exactly what I wanted to ask you if it is possible.
Cool! Can you tell me more about what you want to build and which functionality you need the most?
Hi!
I would like to be able to load and query a HDT file in a browser. I managed to have this working for the FST crate, so that I have a very nice finite state automaton working in browser. The code for that is below.
use fst::automaton;
use fst::{Automaton, IntoStreamer, Streamer};
use regex_automata::dense;
use wasm_bindgen::prelude::*;
#[wasm_bindgen]
pub struct Searcher {
data: Vec<u8>,
}
#[wasm_bindgen]
impl Searcher {
#[wasm_bindgen(constructor)]
pub fn new(data: Vec<u8>) -> Searcher {
Searcher { data }
}
pub fn prefix_search(&mut self, query: &str) -> Result<js_sys::Array, JsError> {
let map = fst::Map::new(&self.data)?;
let prefix = automaton::Str::new(query).starts_with();
let mut stream = map.search(prefix).into_stream();
let results = js_sys::Array::new();
while let Some((_k, v)) = stream.next() {
results.push(&wasm_bindgen::JsValue::from(v));
}
Ok(results)
}
pub fn regex_search(&mut self, query: &str) -> Result<js_sys::Array, JsError> {
let map = fst::Map::new(&self.data)?;
let formatted = format!(r"{}", query);
let dfa = dense::Builder::new()
.anchored(true)
.build(formatted.as_str())
.unwrap();
let mut stream = map.search(&dfa).into_stream();
let results = js_sys::Array::new();
while let Some((_k, v)) = stream.next() {
results.push(&wasm_bindgen::JsValue::from(v.to_string()));
}
Ok(results)
}
pub fn levenstein_1_search(&mut self, query: &str) -> Result<js_sys::Array, JsError> {
let map = fst::Map::new(&self.data)?;
let levenstein = automaton::Levenshtein::new(query, 1)?;
let mut stream = map.search(levenstein).into_stream();
let results = js_sys::Array::new();
while let Some((_k, v)) = stream.next() {
results.push(&wasm_bindgen::JsValue::from(v.to_string()));
}
Ok(results)
}
pub fn levenstein_2_search(&mut self, query: &str) -> Result<js_sys::Array, JsError> {
let map = fst::Map::new(&self.data)?;
let levenstein = automaton::Levenshtein::new(query, 2)?;
let mut stream = map.search(levenstein).into_stream();
let results = js_sys::Array::new();
while let Some((_k, v)) = stream.next() {
results.push(&wasm_bindgen::JsValue::from(v.to_string()));
}
Ok(results)
}
}
// wasm-pack build --release --target web
// wasm-pack build --release --target nodejs
Web assembly only seems to work with 32 bit at the moment, but the required sucds library requires a 64 bit pointer width. https://crates.io/crates/sucds
hdt$ cargo build --target wasm32-unknown-unknown
Compiling eyre v0.6.8
Compiling crc v3.0.1
Compiling ntriple v0.1.1
Compiling sucds v0.7.0
error: `target_pointer_width` must be 64
--> /home/konrad/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sucds-0.7.0/src/lib.rs:51:1
|
51 | compile_error!("`target_pointer_width` must be 64");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Right, I also saw that...
I found the crate https://github.com/rossanoventurini/qwt, which deals with wavelet tree, but does not depend on sucds.
Unfortunately web assembly isn't high enough on my priority list right now that I could justify taking the time and making big changes to the code just for that, like switching out all the code that uses compressed structures like the wavelet tree. I also suspect that this will be just the first in a long line of necessary adaptions. But if you want to try it for yourself, feel free to create a pull request, and I can merge it if it doesn't compromise the performance.
I did an experiment swapping out (most of) sucds for qwt which marginally improved performance and was not too large:
Cargo.toml | 1 +
src/triples.rs | 34 ++++++++++++++--------------------
src/triples/object_iter.rs | 3 ++-
src/triples/predicate_iter.rs | 9 +++++----
src/triples/predicate_object_iter.rs | 3 ++-
src/triples/subject_iter.rs | 3 ++-
6 files changed, 26 insertions(+), 27 deletions(-)
This would be a bit harder to land as now rsdict as well as the packed int array structures would need replacing - and I'm not quite clear on which bits of these are essential how without diving into more of the logic of the library.
Hi there 👋
I'm just checking in to hear if someone managed to get this library working in WASM? Because there's interest in querying over HDT files in browser and Node.js environments using the Comunica engine.
I haven't tried it again but will take another look soon, but I'm not having high hopes as it seemed to require a large amount of changes. Maybe if enough people are interested we could make it a group effort, where everyone focusses on another dependency and creates a pull request there to make it WASM-compatible and I could focus on the HDT Rust library itself.