bonsaidb icon indicating copy to clipboard operation
bonsaidb copied to clipboard

View filtered by Vector content

Open h4-h opened this issue 11 months ago • 0 comments

Hi! BonsaiDB seems like a really cool project. However, I’m having trouble finding examples of how to filter View by Vec<T> content. The book provides an example for filtering with a single string here, but that doesn’t seem to fit my use case. I’m trying to figure out how to filter my collection to get posts that include a specific tag (or multiple tags), without reducing the results to a count of posts or something else. I just want to retrieve the posts that match the tags.

SQL example:

SELECT DISTINCT p.post_id, p.text_content, GROUP_CONCAT(t.tag_name) AS tags
FROM posts p
JOIN post_tags pt ON p.post_id = pt.post_id
JOIN tags t ON pt.tag_id = t.tag_id
WHERE t.tag_name IN ('Rust')
GROUP BY p.post_id, p.text_content;

Rust filtering example:

// <-- ... code ... -->

struct Post {
  // <-- ... code ... -->
  tags: Vec<String>,
  // <-- ... code ... -->
}

// <-- ... code ... -->

let rust_posts = Post::all(&db)
            .query()?
            .into_iter()
            .filter(|post| post.contents.tags.contains(&"Rust".to_string()))
            .collect::<Vec<_>>();

// <-- ... code ... -->
Working code example without view
use bonsaidb::{core::schema::{Collection, Schema, SerializedCollection}, local::{config::{Builder, StorageConfiguration}, Database}};
use serde::{Deserialize, Serialize};

const DB_PATH: &str = "./data.bdb";

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let configuration = StorageConfiguration::new(DB_PATH);
    let database = Database::open::<DataStore>(configuration)?;

    let _first = Post::new("Async in Rust", vec!["Rust", "Learning"]).push_into(&database)?;
    let _second = Post::new("What does zero cost mean in Rust", vec!["Rust", "Learning"]).push_into(&database)?;
    let _third = Post::new("Go vs Rust", vec!["Go", "Rust"]).push_into(&database)?;
    let _fourth = Post::new("Goroutines in a nutsell", vec!["Go"]).push_into(&database)?;

    { // Filter all by tags.
        let all_rust_posts = Post::all(&database)
            .query()?
            .into_iter()
            .filter(|post| post.contents.tags.contains(&"Rust".to_string()))
            .collect::<Vec<_>>();

        println!("rusty posts:");
        all_rust_posts.iter().for_each(|post| println!("  {:?}", post.contents));

        // rusty posts:
        //   Post { content: "Async in Rust", tags: ["Rust", "Learning"] }
        //   Post { content: "What does zero cost mean in Rust", tags: ["Rust", "Learning"] }
        //   Post { content: "Go vs Rust", tags: ["Go", "Rust"] }
    }

    std::fs::remove_dir_all(DB_PATH).map_err(Into::into)
}

#[derive(Debug, Schema)]
#[schema(name = "data", collections = [Post])]
struct DataStore;

#[derive(Debug, Serialize, Deserialize, Collection)]
#[collection(name = "posts")]
struct Post {
    pub content: String,
    pub tags: Vec<String>,
}

impl Post {
    pub fn new(
        content: impl Into<String>,
        tags: Vec<impl Into<String>>,
    ) -> Self {
        Self {
            content: content.into(),
            tags: tags.into_iter().map(Into::into).collect(),
        }
    }
}

P.S.

I'm trying to figure out how to go through the items in a Collection one at a time without loading everything at once. Is there a way to get an Iter directly, instead of using all() to load everything and then converting it to Iter?

h4-h avatar Feb 03 '25 04:02 h4-h