Oracle.jl icon indicating copy to clipboard operation
Oracle.jl copied to clipboard

Priorities

Open felipenoris opened this issue 6 years ago • 51 comments

  • [x] Try fix GC errors on Julia v0.6
  • [x] Fix LOB reference handling with addRef
  • [x] Fix TIMESTAMP nanosecond precision data loss when converting to DateTime
  • [x] Support TIMESTAMP with time zones
  • [x] Utility function to execute a script file
  • [x] ResultSet API
  • [x] Make Variables and OracleValues 1-index-based
  • [x] remove bangs ! from API methods.
  • [ ] preserve precision for numeric values
  • [ ] Support ROWID data type
  • [x] add execute_many helper method that operates directly on a statement
  • [x] Docstrings
  • [x] Doc website
  • [ ] Incremental write Lob API
  • [x] Try fix handling of RAW fields
  • [ ] Base.show methods for data structures
  • [ ] Use BinaryBuilder to build deps
  • [x] Tables.jl integration

felipenoris avatar Feb 05 '19 13:02 felipenoris

This looks great. Would you consider doing a BinaryBuilder build for ODPI-C, so that user do not have to compile it themselves?

aviks avatar Feb 12 '19 10:02 aviks

Some integration with Tables.jl would be nice. @quinnj might be able to assit.

aviks avatar Feb 12 '19 11:02 aviks

Yes, these are all good ideas.

felipenoris avatar Feb 12 '19 11:02 felipenoris

Yes, Tables.jl integration is something I should have done before, as in with XLSX.jl.

felipenoris avatar Feb 12 '19 11:02 felipenoris

Yep, happy to help however I can. If someone wants to take a stab at Tables.jl, I'm happy to review, or if you give me some pointers around the code, I could take a stab at doing the integration myself.

quinnj avatar Feb 12 '19 18:02 quinnj

What about support for DECIMAL and NUMERIC types? (up to a point, IEEE decimal floating point formats could be used, using DecFP.jl could support up to 34 digits (with Dec128), even though numbers with 35-38 digits could not be represented exactly. (For those, I do wish we had a package like Java's BigDecimal, even if it would be much slower than the IEEE formats [which in my testing of DecFP.jl, were faster in Julia than BigFloat types, because they are immutable types, and no finalizers etc. need to be set up for every number allocated, as is the case for BigFloat)

Note: I've found code that describes how to convert to/from Oracle's internal type to a string, that could be used with DecFP.jl to create the Dec128 values or create one of Oracle's numeric type to set values.

ScottPJones avatar Feb 18 '19 19:02 ScottPJones

Hi @ScottPJones !

This is something I've been thinking about. Current implementation gets values as Float64, converted from C double type.

Oracle documentation describes NUMBER columns with a precision and scale, up to 38 digits of precision. I can even get this information while querying data from the database.

One possibility I had in mind was to return values as fixed decimals, using FixedPointDecimals.jl. The common use case will be a monetary value, with 2 decimals. I think if I use DecFP.jl, the precision of the value (I mean, the information that a monetary value has exactly 2 decimals) will be lost during the process, I guess.

What do you think about it?

felipenoris avatar Feb 18 '19 22:02 felipenoris

@felipenoris ping @anthony-tuininga (I can forward email, if that's easier - see my profile) since he has done some work in this area.

cjbj avatar Feb 18 '19 23:02 cjbj

@cjbj , @anthony-tuininga, thanks for the input! Yes, whatever the solution is, we need to understand how the numeric internal representation is defined. I could spend some time to infer it based on my understanding of floating number representation, but it is helpful if there is some code already to show how to parse the bytes to a number.

felipenoris avatar Feb 19 '19 02:02 felipenoris

For Oracle NUMBER, look at https://github.com/oracle/odpi/blob/master/src/dpiData.c#L114 There are pros & cons to doing this on the client side.

cjbj avatar Feb 19 '19 03:02 cjbj

@anthony-tuininga, how does cx_Oracle deal with numeric type precision?

felipenoris avatar Feb 20 '19 13:02 felipenoris

@cjbj , @anthony-tuininga, based on this code, whenever I ask a numeric value as bytes, the byte representation is actually a text: https://github.com/oracle/odpi/blob/e95f30cc3180e86d23d343348be80b1a781c58cc/src/dpiVar.c#L728

Is there a way to get the raw byte representation?

Digging in the code, it looks like I have to create a patch to create my own version for dpiUtils__parseOracleNumber. This will avoid the overhead of converting to->from string on the client.

Also, maybe a new native type enum value could be defined to ask the driver to get numeric values in a raw representation (maybe a struct with the result of parseOracleNumber).

felipenoris avatar Feb 20 '19 13:02 felipenoris

Oracle internal number types encode the value so that they can be compared numerically, simply by doing string (byte) comparisons (this is a common technique, that I've used in the past also). The values themselves do not store the precision, that information is only in the metadata you get back from Oracle about the column types. They are a variable length, from 1 - 21 bytes (stored radix 100 with offsets, a leading exponent indicator, and an optional trailing negative indicator. That code you referenced does get the internal raw byte representation, and just converts it to text, because it doesn't expect anyone to directly use their internal raw format for NUMBER, even though that would be much more efficient.

ScottPJones avatar Feb 20 '19 14:02 ScottPJones

You could make a Julia type that directly stores the value (maybe as a 24 or 32 byte bitstype, which would probably be the most compact way of storing the full value in Julia), and then have methods that convert to/from strings (with no loss), or for performing operations, into a Dec128 (which might have to round off the last 1-4 digits of a 35-38 digit number) (since Julia doesn't have a BigDecimal type like Java that could handle these numbers directly.

ScottPJones avatar Feb 20 '19 14:02 ScottPJones

@ScottPJones Yes! That's exactly the idea! This is very much like the timestamp case, where there's a special struct for it dpiTimestamp, and I made a julia type OraTimestamp that gets converted to a Timestamp which is defined using julia types Date and Time, to preserve nanosecond precision.

In the case of NUMBER, I could have a struct OraNumber with exponent, mantissa information, and that could have an algebra defined on it, or could be converted to a julia type that preserves its data.

I'm just not sure if there is a julia type for it. I mean, Dec128 will cause information loss. That's why I've been looking at FixedPointDecimals, but I don't have experience with it. But the idea that the precision of the value gets encoded in the julia type is just a good fit for this use case. The type will have a variable size, and will be defined at runtime. No need to define it with a fixed size of 21 bytes.

felipenoris avatar Feb 20 '19 14:02 felipenoris

@anthony-tuininga, how does cx_Oracle deal with numeric type precision?

In cx_Oracle and node-oracledb there is no attempt to use the Oracle Number format directly. Instead, it is converted to double, integer or string depending on the needs of the driver. That is because the Oracle Number format is not directly usable and conversion to/from that format is not simple. If you really want to deal with it yourself directly I could add a "native type" which simply passes the value through to you directly. If that is the case, please add an enhancement request to ODPI-C for that purpose.

anthony-tuininga avatar Feb 20 '19 14:02 anthony-tuininga

The reason a fixed size might be better, is that using something like Vector will use a lot more memory than simply using a fixed size of 24 bytes (you need one byte to encode the "length", followed by the exponent byte, and 1-20 extra bytes (and 2 padding bytes to make it 3 64-bit words). Having a really immutable bits type means that it can be stored in registers, on the stack, etc. (which is one reason that DecFP.jl is faster than using BigFloat, and takes a lot less memory)

FixedPointDecimals don't allow for exponents, so you could easily have information loss there as well.

ScottPJones avatar Feb 20 '19 14:02 ScottPJones

If you really want to deal with it yourself directly I could add a "native type" which simply passes the value through to you directly.

That would be great to have in ODPI-C, IMO.

ScottPJones avatar Feb 20 '19 14:02 ScottPJones

@anthony-tuininga , nice!!! Let me study oracle's format, then I'll open an issue with detailed information.

felipenoris avatar Feb 20 '19 14:02 felipenoris

@ScottPJones , I see. So there is a tradeoff between memory consumption and performance here.

felipenoris avatar Feb 20 '19 14:02 felipenoris

Actually, both memory consumption and performance are better, if you can use a fixed size immutable or bitstype. Julians often seem to forget the huge effect memory consumption can have on overall performance of large systems, doing small benchmarks that don't create enough items to use GC really doesn't show that impact.

ScottPJones avatar Feb 20 '19 14:02 ScottPJones

Think of a vector of these things, the vector itself might be n * pointer size, but then you will need around 64 bytes for each element in the vector (and will be doing indirections through the pointers every time you access one of them).

ScottPJones avatar Feb 20 '19 15:02 ScottPJones

@ScottPJones , I see. But say I have an immutable parametric type like struct OraNumber{T} <: Real. The parameter of the type can affect the size of the struct.

So, a vector of values of type OraNumber{x} , for a fixed value x , will have memory and performance issues you're pointing out? Since there's a fixed size, there will be no padding, and also it could be stack-allocated. Or not?

I might be wrong in my understanding. You're much more experienced in this area! :) I'm trying to explore if there's a way to avoid using a 24bytes type for all numeric values, since the common use case could use much less memory.

felipenoris avatar Feb 20 '19 15:02 felipenoris

You could probably define an 8, 16, and 24 byte sizes, depending on whether the column is declared with just NUMBER (or DECIMAL), in which case it would need the larger size to support arbitrary Oracle numbers, 8 bytes would be able to support 8 - 1 len - 1 scale - 1 neg => 10 decimal digits max, 16 would support 23 digits, however, I think for cases where the precision is <= 7 digits, or <= 15 digits, that using Dec32 and Dec64 would be better, as well as using Dec128 <= 34 digits. I also just noticed that DECIMAL on Oracle is not the same as the NUMBER type, as on most DBMS that I am aware of, the DECIMAL type seems to correspond to the IEEE 64-bit type (i.e. either Dec64, which is in a binary form, or the other packed decimal form (used on IBM hardware), and the SHORTDECIMAL type corresponds to the 32-bit type, so you'd probably need to add support for DecFP.jl anyway. For columns with numbers with > 34 digits, the only option would be an OraNumber type, but if it's only used for those very large numbers, it could always be a 24 byte bits type.

ScottPJones avatar Feb 20 '19 16:02 ScottPJones

Note: if we did want the (hardware supported on IBM platforms, nice since Julia runs on the LE Power), I could restart my attempt to get the decimal C library ported or wrapped to Julia, which supports the packed decimal format (it uses 10-bit triplets to efficiently pack 3 digits at a time, and make it easy for hardware to deal with).

ScottPJones avatar Feb 20 '19 17:02 ScottPJones

So, a vector of values of type OraNumber{x} , for a fixed value x , will have memory and performance issues you're pointing out? Since there's a fixed size, there will be no padding, and also it could be stack-allocated. Or not?

No, if OraNumber{x} were implemented as OraNumber{UInt64}, OraNumber{UInt128}, and OraNumber{UInt192} (where UInt192 is a 24-byte bitstype, just for this purpose), it would have much less memory performance issues than trying to use variable length fields.

I'm trying to explore if there's a way to avoid using a 24bytes type for all numeric values, since the common use case could use much less memory.

Precisely, you'd simply need to look at the precision value that Oracle gives you, to see which type to use for that particular column.

ScottPJones avatar Feb 20 '19 17:02 ScottPJones

Oh, I just thought of another thing. Maybe this numeric support should not be put into this package, but rather, a package that can be used with ODBC.jl, and another other database wrapper that needs to correctly support database DECIMAL and NUMERIC types.

ScottPJones avatar Feb 20 '19 17:02 ScottPJones

@ScottPJones I think we agree on the concepts. I just need to study a bit more to choose a solution.

About DECIMAL/NUMERIC, does Oracle DB has them? I mean, this page lists NUMBER and regular float and double. I don't see DECIMAL there. Isn't that for JavaDB? (I never heard of it, just googled).

felipenoris avatar Feb 20 '19 18:02 felipenoris

Ah, I see where I got confused: Oracle OLAP DML has DECIMAL and SHORTDECIMAL that match IEEE, Oracle DBMS has DECIMAL and NUMBER (which is basically the same as NUMERIC in the standard)

ScottPJones avatar Feb 20 '19 23:02 ScottPJones

Here is a good description of the ANSI SQL numeric types: https://academy.vertabelo.com/blog/understanding-numerical-data-types-sql/ In the standard there is a small difference between DECIMAL and NUMERIC, which has to do whether the precision is exact or not (DECIMAL numbers can have >= the specified precision)

ScottPJones avatar Feb 21 '19 00:02 ScottPJones