hdt icon indicating copy to clipboard operation
hdt copied to clipboard

add web assembly target?

Open KonradHoeffner opened this issue 2 years ago • 13 comments

Would anyone need this?

Investigate whether that would be possible with reasonable efforts.

KonradHoeffner avatar Apr 26 '23 13:04 KonradHoeffner

cargo build --target wasm32-unknown-unknown does not work

error: could not compile `regex-syntax` (lib) due to 990 previous errors

KonradHoeffner avatar Apr 26 '23 13:04 KonradHoeffner

Yes, this would exactly what I wanted to ask you if it is possible.

claudius108 avatar Jun 18 '23 19:06 claudius108

Cool! Can you tell me more about what you want to build and which functionality you need the most?

KonradHoeffner avatar Jun 19 '23 13:06 KonradHoeffner

Hi!

I would like to be able to load and query a HDT file in a browser. I managed to have this working for the FST crate, so that I have a very nice finite state automaton working in browser. The code for that is below.

use fst::automaton;
use fst::{Automaton, IntoStreamer, Streamer};
use regex_automata::dense;
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub struct Searcher {
    data: Vec<u8>,
}

#[wasm_bindgen]
impl Searcher {
    #[wasm_bindgen(constructor)]
    pub fn new(data: Vec<u8>) -> Searcher {
        Searcher { data }
    }

    pub fn prefix_search(&mut self, query: &str) -> Result<js_sys::Array, JsError> {
        let map = fst::Map::new(&self.data)?;

        let prefix = automaton::Str::new(query).starts_with();

        let mut stream = map.search(prefix).into_stream();

        let results = js_sys::Array::new();
        while let Some((_k, v)) = stream.next() {
            results.push(&wasm_bindgen::JsValue::from(v));
        }

        Ok(results)
    }

    pub fn regex_search(&mut self, query: &str) -> Result<js_sys::Array, JsError> {
        let map = fst::Map::new(&self.data)?;

        let formatted = format!(r"{}", query);
        let dfa = dense::Builder::new()
            .anchored(true)
            .build(formatted.as_str())
            .unwrap();

        let mut stream = map.search(&dfa).into_stream();

        let results = js_sys::Array::new();
        while let Some((_k, v)) = stream.next() {
            results.push(&wasm_bindgen::JsValue::from(v.to_string()));
        }

        Ok(results)
    }

    pub fn levenstein_1_search(&mut self, query: &str) -> Result<js_sys::Array, JsError> {
        let map = fst::Map::new(&self.data)?;

        let levenstein = automaton::Levenshtein::new(query, 1)?;

        let mut stream = map.search(levenstein).into_stream();

        let results = js_sys::Array::new();
        while let Some((_k, v)) = stream.next() {
            results.push(&wasm_bindgen::JsValue::from(v.to_string()));
        }

        Ok(results)
    }

    pub fn levenstein_2_search(&mut self, query: &str) -> Result<js_sys::Array, JsError> {
        let map = fst::Map::new(&self.data)?;

        let levenstein = automaton::Levenshtein::new(query, 2)?;

        let mut stream = map.search(levenstein).into_stream();

        let results = js_sys::Array::new();
        while let Some((_k, v)) = stream.next() {
            results.push(&wasm_bindgen::JsValue::from(v.to_string()));
        }

        Ok(results)
    }    
}

// wasm-pack build --release --target web
// wasm-pack build --release --target nodejs

claudius108 avatar Jun 19 '23 15:06 claudius108

Web assembly only seems to work with 32 bit at the moment, but the required sucds library requires a 64 bit pointer width. https://crates.io/crates/sucds

hdt$ cargo build --target wasm32-unknown-unknown
   Compiling eyre v0.6.8
   Compiling crc v3.0.1
   Compiling ntriple v0.1.1
   Compiling sucds v0.7.0
error: `target_pointer_width` must be 64
  --> /home/konrad/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sucds-0.7.0/src/lib.rs:51:1
   |
51 | compile_error!("`target_pointer_width` must be 64");
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

KonradHoeffner avatar Jun 19 '23 16:06 KonradHoeffner

Right, I also saw that...

claudius108 avatar Jun 19 '23 17:06 claudius108

I found the crate https://github.com/rossanoventurini/qwt, which deals with wavelet tree, but does not depend on sucds.

claudius108 avatar Jun 20 '23 09:06 claudius108

Unfortunately web assembly isn't high enough on my priority list right now that I could justify taking the time and making big changes to the code just for that, like switching out all the code that uses compressed structures like the wavelet tree. I also suspect that this will be just the first in a long line of necessary adaptions. But if you want to try it for yourself, feel free to create a pull request, and I can merge it if it doesn't compromise the performance.

KonradHoeffner avatar Jun 20 '23 09:06 KonradHoeffner

I did an experiment swapping out (most of) sucds for qwt which marginally improved performance and was not too large:

 Cargo.toml                           |  1 +
 src/triples.rs                       | 34 ++++++++++++++--------------------
 src/triples/object_iter.rs           |  3 ++-
 src/triples/predicate_iter.rs        |  9 +++++----
 src/triples/predicate_object_iter.rs |  3 ++-
 src/triples/subject_iter.rs          |  3 ++-
 6 files changed, 26 insertions(+), 27 deletions(-)

This would be a bit harder to land as now rsdict as well as the packed int array structures would need replacing - and I'm not quite clear on which bits of these are essential how without diving into more of the logic of the library.

bz2 avatar Apr 23 '24 11:04 bz2

Hi there 👋

I'm just checking in to hear if someone managed to get this library working in WASM? Because there's interest in querying over HDT files in browser and Node.js environments using the Comunica engine.

rubensworks avatar Oct 15 '24 11:10 rubensworks

I haven't tried it again but will take another look soon, but I'm not having high hopes as it seemed to require a large amount of changes. Maybe if enough people are interested we could make it a group effort, where everyone focusses on another dependency and creates a pull request there to make it WASM-compatible and I could focus on the HDT Rust library itself.

KonradHoeffner avatar Oct 18 '24 09:10 KonradHoeffner