rust-oracle icon indicating copy to clipboard operation
rust-oracle copied to clipboard

Any reason SqlValues are .dup_by_handle()'d on retrieving a row?

Open christoph-heiss opened this issue 2 years ago • 3 comments

Hi,

as the title says. During profiling of retrieving large sets of data from Oracle databases, I noticed that <Row as RowValue>::get() prominently stands out if there are lots of small rows.

For each row, the value handles are duplicated - is there any specific reason for this? If there is, than the following is probably wrong and I'd have to accept the sizeable performance hit. I created a small patch which shares the values across row, which worked fine so far.

Thanks, Christoph

christoph-heiss avatar Jun 28 '22 13:06 christoph-heiss

For each row, the value handles are duplicated - is there any specific reason for this? If there is, than the following is probably wrong and I'd have to accept the sizeable performance hit.

That's because if handle isn't duplicated, fetching a next row changes the previously fetched rows.

For example:

use oracle::Connection;

fn main() -> std::result::Result<(), Box<dyn std::error::Error>> {
    let conn = Connection::connect("odpic", "welcome", "")?;
    let mut stmt = conn
        .statement("select level from dual connect by level <= 4")
        .build()?;
    let mut result_set = stmt.query(&[])?;
    println!("fetch first row");
    let first_row = result_set.next().unwrap()?;
    println!("  content of first row: {}", first_row.get::<usize, i32>(0)?);

    println!("fetch second row");
    let second_row = result_set.next().unwrap()?;
    println!("  content of first row: {}", first_row.get::<usize, i32>(0)?);
    println!("  content of second row: {}", second_row.get::<usize, i32>(0)?);

    Ok(())
}

The above prints:

fetch first row
  content of first row: 1
fetch second row
  content of first row: 1
  content of second row: 2

When your patch is applied, the first row is changed by the second fetch as follows:

fetch first row
  content of first row: 1
fetch second row
  content of first row: 2
  content of second row: 2

kubo avatar Jul 07 '22 15:07 kubo

As for performance, could you use query methods with _as suffix instead of ones without _as? See this document.

I updated it just before to recommend the RowValue derive macro to fetch rows by names.

kubo avatar Jul 07 '22 15:07 kubo

Thank you for the explanation, that makes sense! Unfortunately, the number of columns and their types are unknown (at compile time), thus using _as methods or derving RowValue is

Simplified, my use case is to read all selected rows one by one, without ever touching a previous row again. (Creating a Statement, calling .into_result_set(), then iterating over it, thereby calling Row::get() to retrieve each column individually).

This also explains why - although completely wrong - my hacked solution worked.

That probably means I have to keep using the current patched version for now.

Edit: After some experimenting and coming up with a "workaround" using the seq-macro crate, I discovered some other things why using .get_as() would not work for my case:

  • The RowValue trait is limited to 50 columns, but I can't assume that this is the maximum.
  • The performance was - funnily enough - about the same.

christoph-heiss avatar Jul 07 '22 15:07 christoph-heiss