dioxus icon indicating copy to clipboard operation
dioxus copied to clipboard

Scraping function works on web, but doesn't on mobile

Open darkwriterr opened this issue 11 months ago • 1 comments

Problem

I am trying to create a mobile app that scrapes a website and then shows the data in the app. I try to run it on web, works fine, but when I try it on mobile it doesn't work. I tried adding various prints around the code, but when the component that will handle the scraping is called, none of the prints do anything

Steps To Reproduce

Steps to reproduce the behavior:

  • create a new project using $ dx new app, choose fullstack, with router
  • copy the code provided
  • run the code using $ dx serve --platform android
  • wait for it to start
  • the data is not showing

Expected behavior

the scraped data shows

here's the codes I use:

ui/src/logic/scrape.rs:

use reqwest::Client;
use scraper::{Html, Selector};
use std::collections::HashMap;

#[cfg(target_arch = "wasm32")]
use lol_alloc::{AssumeSingleThreaded, FreeListAllocator};

// Set up the global allocator for WebAssembly
#[cfg(target_arch = "wasm32")]
#[global_allocator]
static ALLOCATOR: AssumeSingleThreaded<FreeListAllocator> =
    unsafe { AssumeSingleThreaded::new(FreeListAllocator::new()) };

pub async fn scrape(url: &str) -> Result<Vec<HashMap<String, String>>, String> {
    let client = Client::new();

    // Use allorigins as a CORS proxy
    let proxy_base = "https://api.allorigins.win/raw?url=";
    let target_url = if url.is_empty() {
        format!("{}https://website.com/", proxy_base)
    } else {
        format!("{}https://website.com/{}", proxy_base, url)
    };

    println!("Scraping: {}", target_url);

    // Set a size limit for the response
    let res = match client.get(&target_url)
        .send()
        .await
    {
        Ok(response) => {
            // Check response size before processing
            let content_length = response.content_length().unwrap_or(0);
            if content_length > 5_000_000 { // 5MB limit
                return Err("Response too large".to_string());
            }
            response
        },
        Err(e) => return Err(format!("Failed to send request: {}", e))
    };

    let body = match res.text().await {
        Ok(text) => text,
        Err(e) => return Err(format!("Failed to get response body: {}", e))
    };

    // Process in chunks if document is large
    let document = Html::parse_document(&body);

    // More specific selector to reduce memory usage
    let links_selector = match Selector::parse("div.content > a") {
        Ok(selector) => selector,
        Err(e) => return Err(format!("Invalid selector: {}", e))
    };

    // Limit result size
    let mut results: Vec<HashMap<String, String>> = Vec::with_capacity(100);

    // Process limited number of links
    for (i, link) in document.select(&links_selector).enumerate() {
        if i >= 100 { // Limit to 100 results max
            break;
        }

        let mut item = HashMap::new();

        // Extract only what's needed
        if let Some(href) = link.value().attr("href") {
            item.insert("href".to_string(), href.to_string());
        }

        // Avoid collecting all text nodes into a vector first
        let text = link.text().collect::<String>().trim().to_string();
        item.insert("title".to_string(), text);

        results.push(item);
    }

    Ok(results)
}

ui/src/favorites.rs:

use dioxus::prelude::*;
use serde_json;
use serde::Deserialize;
use std::fs::File;
use std::io::{BufReader, Read};
use crate::logic::scrape::scrape;
use std::collections::HashMap;
use wasm_bindgen_futures::spawn_local;

const FAVORITES_CSS: Asset = asset!("./assets/styling/favorites.css");

#[derive(Deserialize, PartialEq)]
struct Course {
    title: String,
    favorite: bool,
}

#[derive(Deserialize, PartialEq)]
struct CourseData {
    courses: Vec<Course>,
}

#[component]
pub fn FavoriteList() -> Element {
    let mut scraped_data = use_signal(|| Vec::<HashMap<String, String>>::new());

    use_effect(move || {
        to_owned![scraped_data];

        spawn_local(async move {
            match scrape("").await {
                Ok(data) => {
                    scraped_data.set(data);
                },
                Err(err) => {
                    scraped_data.set(vec![{
                        let mut map = HashMap::new();
                        map.insert("title".to_string(), err);
                        map
                    }]);
                }
            }
        });
    });

    rsx! {
        document::Link { rel: "stylesheet", href: FAVORITES_CSS }
        meta { name: "viewport", content: "width=device-width, initial-scale=1" }

        div {
            h2 {
                class: "favorites-title",
                "Favorites"
            }
            if !scraped_data.read().is_empty() {
                div {
                    class: "favorites-courses-list",
                    h3 { "Favorites:" }
                    for (index, item) in scraped_data.read().iter().enumerate() {
                        if let Some(title) = item.get("title") {
                            div {
                                class: "favorites-course",
                                svg {
                                    xmlns: "http://www.w3.org/2000/svg",
                                    view_box: "0 0 24 24",
                                    width: "24",
                                    height: "24",
                                    fill: "#FF0000",
                                    stroke: "none",
                                    path {
                                        d: "M12 21.35l-1.45-1.32C5.4 15.36 2 12.28 2 8.5 2 5.42 4.42 3 7.5 3c1.74 0 3.41.81 4.5 2.09C13.09 3.81 14.76 3 16.5 3 19.58 3 22 5.42 22 8.5c0 3.78-3.4 6.86-8.55 11.54L12 21.35z"
                                    }
                                }
                                p { "{title}" }
                                p {
                                    class: "favorites-clickable",
                                    ">"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

mobile/src/main.rs:

use dioxus::html::completions::CompleteWithBraces::progress;
use dioxus::prelude::*;
use ui::{ProgressTracker, search_bar::search_bar, FavoriteList};

#[component]
pub fn Home() -> Element {
    rsx! {
        search_bar {},
        ProgressTracker {id: 1, progress: 5},
        FavoriteList {}
    }
}

Screenshots

Phone:

Image

Web:

Image

Environment:

  • Dioxus version: 0.6.3
  • Rust version: 1.82.0
  • OS info: Android 15(Emulator), Windows 11 Pro 24H2
  • App platform: web, android

darkwriterr avatar Mar 25 '25 22:03 darkwriterr

I tried adding various prints around the code, but when the component that will handle the scraping is called, none of the prints do anything

println! will not work on mobile or wasm. You need to use tracing::info! to log in Dioxus. None of the scraping logic is related to Dioxus. If you think that is what is failing here, it doesn't sound like a bug with Dioxus.

ealmloff avatar Mar 26 '25 13:03 ealmloff