Priorities
- [x] Try fix GC errors on Julia v0.6
- [x] Fix LOB reference handling with
addRef - [x] Fix TIMESTAMP nanosecond precision data loss when converting to
DateTime - [x] Support TIMESTAMP with time zones
- [x] Utility function to execute a script file
- [x] ResultSet API
- [x] Make Variables and OracleValues 1-index-based
- [x] remove bangs
!from API methods. - [ ] preserve precision for numeric values
- [ ] Support
ROWIDdata type - [x] add
execute_manyhelper method that operates directly on a statement - [x] Docstrings
- [x] Doc website
- [ ] Incremental write Lob API
- [x] Try fix handling of RAW fields
- [ ]
Base.showmethods for data structures - [ ] Use BinaryBuilder to build deps
- [x] Tables.jl integration
This looks great. Would you consider doing a BinaryBuilder build for ODPI-C, so that user do not have to compile it themselves?
Some integration with Tables.jl would be nice. @quinnj might be able to assit.
Yes, these are all good ideas.
Yes, Tables.jl integration is something I should have done before, as in with XLSX.jl.
Yep, happy to help however I can. If someone wants to take a stab at Tables.jl, I'm happy to review, or if you give me some pointers around the code, I could take a stab at doing the integration myself.
What about support for DECIMAL and NUMERIC types? (up to a point, IEEE decimal floating point formats could be used, using DecFP.jl could support up to 34 digits (with Dec128), even though numbers with 35-38 digits could not be represented exactly.
(For those, I do wish we had a package like Java's BigDecimal, even if it would be much slower than the IEEE formats [which in my testing of DecFP.jl, were faster in Julia than BigFloat types, because they are immutable types, and no finalizers etc. need to be set up for every number allocated, as is the case for BigFloat)
Note: I've found code that describes how to convert to/from Oracle's internal type to a string, that could be used with DecFP.jl to create the Dec128 values or create one of Oracle's numeric type to set values.
Hi @ScottPJones !
This is something I've been thinking about. Current implementation gets values as Float64, converted from C double type.
Oracle documentation describes NUMBER columns with a precision and scale, up to 38 digits of precision. I can even get this information while querying data from the database.
One possibility I had in mind was to return values as fixed decimals, using FixedPointDecimals.jl. The common use case will be a monetary value, with 2 decimals. I think if I use DecFP.jl, the precision of the value (I mean, the information that a monetary value has exactly 2 decimals) will be lost during the process, I guess.
What do you think about it?
@felipenoris ping @anthony-tuininga (I can forward email, if that's easier - see my profile) since he has done some work in this area.
@cjbj , @anthony-tuininga, thanks for the input! Yes, whatever the solution is, we need to understand how the numeric internal representation is defined. I could spend some time to infer it based on my understanding of floating number representation, but it is helpful if there is some code already to show how to parse the bytes to a number.
For Oracle NUMBER, look at https://github.com/oracle/odpi/blob/master/src/dpiData.c#L114 There are pros & cons to doing this on the client side.
@anthony-tuininga, how does cx_Oracle deal with numeric type precision?
@cjbj , @anthony-tuininga, based on this code, whenever I ask a numeric value as bytes, the byte representation is actually a text: https://github.com/oracle/odpi/blob/e95f30cc3180e86d23d343348be80b1a781c58cc/src/dpiVar.c#L728
Is there a way to get the raw byte representation?
Digging in the code, it looks like I have to create a patch to create my own version for dpiUtils__parseOracleNumber. This will avoid the overhead of converting to->from string on the client.
Also, maybe a new native type enum value could be defined to ask the driver to get numeric values in a raw representation (maybe a struct with the result of parseOracleNumber).
Oracle internal number types encode the value so that they can be compared numerically, simply by doing string (byte) comparisons (this is a common technique, that I've used in the past also). The values themselves do not store the precision, that information is only in the metadata you get back from Oracle about the column types. They are a variable length, from 1 - 21 bytes (stored radix 100 with offsets, a leading exponent indicator, and an optional trailing negative indicator. That code you referenced does get the internal raw byte representation, and just converts it to text, because it doesn't expect anyone to directly use their internal raw format for NUMBER, even though that would be much more efficient.
You could make a Julia type that directly stores the value (maybe as a 24 or 32 byte bitstype, which would probably be the most compact way of storing the full value in Julia), and then have methods that convert to/from strings (with no loss), or for performing operations, into a Dec128 (which might have to round off the last 1-4 digits of a 35-38 digit number) (since Julia doesn't have a BigDecimal type like Java that could handle these numbers directly.
@ScottPJones Yes! That's exactly the idea! This is very much like the timestamp case, where there's a special struct for it dpiTimestamp, and I made a julia type OraTimestamp that gets converted to a Timestamp which is defined using julia types Date and Time, to preserve nanosecond precision.
In the case of NUMBER, I could have a struct OraNumber with exponent, mantissa information, and that could have an algebra defined on it, or could be converted to a julia type that preserves its data.
I'm just not sure if there is a julia type for it. I mean, Dec128 will cause information loss. That's why I've been looking at FixedPointDecimals, but I don't have experience with it. But the idea that the precision of the value gets encoded in the julia type is just a good fit for this use case. The type will have a variable size, and will be defined at runtime. No need to define it with a fixed size of 21 bytes.
@anthony-tuininga, how does
cx_Oracledeal with numeric type precision?
In cx_Oracle and node-oracledb there is no attempt to use the Oracle Number format directly. Instead, it is converted to double, integer or string depending on the needs of the driver. That is because the Oracle Number format is not directly usable and conversion to/from that format is not simple. If you really want to deal with it yourself directly I could add a "native type" which simply passes the value through to you directly. If that is the case, please add an enhancement request to ODPI-C for that purpose.
The reason a fixed size might be better, is that using something like Vector will use a lot more memory than simply using a fixed size of 24 bytes (you need one byte to encode the "length", followed by the exponent byte, and 1-20 extra bytes (and 2 padding bytes to make it 3 64-bit words). Having a really immutable bits type means that it can be stored in registers, on the stack, etc. (which is one reason that DecFP.jl is faster than using BigFloat, and takes a lot less memory)
FixedPointDecimals don't allow for exponents, so you could easily have information loss there as well.
If you really want to deal with it yourself directly I could add a "native type" which simply passes the value through to you directly.
That would be great to have in ODPI-C, IMO.
@anthony-tuininga , nice!!! Let me study oracle's format, then I'll open an issue with detailed information.
@ScottPJones , I see. So there is a tradeoff between memory consumption and performance here.
Actually, both memory consumption and performance are better, if you can use a fixed size immutable or bitstype. Julians often seem to forget the huge effect memory consumption can have on overall performance of large systems, doing small benchmarks that don't create enough items to use GC really doesn't show that impact.
Think of a vector of these things, the vector itself might be n * pointer size, but then you will need around 64 bytes for each element in the vector (and will be doing indirections through the pointers every time you access one of them).
@ScottPJones , I see. But say I have an immutable parametric type like struct OraNumber{T} <: Real. The parameter of the type can affect the size of the struct.
So, a vector of values of type OraNumber{x} , for a fixed value x , will have memory and performance issues you're pointing out? Since there's a fixed size, there will be no padding, and also it could be stack-allocated. Or not?
I might be wrong in my understanding. You're much more experienced in this area! :) I'm trying to explore if there's a way to avoid using a 24bytes type for all numeric values, since the common use case could use much less memory.
You could probably define an 8, 16, and 24 byte sizes, depending on whether the column is declared with just NUMBER (or DECIMAL), in which case it would need the larger size to support arbitrary Oracle numbers,
8 bytes would be able to support 8 - 1 len - 1 scale - 1 neg => 10 decimal digits max, 16 would support 23 digits, however, I think for cases where the precision is <= 7 digits, or <= 15 digits, that using Dec32 and Dec64 would be better, as well as using Dec128 <= 34 digits. I also just noticed that DECIMAL on Oracle is not the same as the NUMBER type, as on most DBMS that I am aware of, the DECIMAL type seems to correspond to the IEEE 64-bit type (i.e. either Dec64, which is in a binary form, or the other packed decimal form (used on IBM hardware), and the SHORTDECIMAL type corresponds to the 32-bit type, so you'd probably need to add support for DecFP.jl anyway.
For columns with numbers with > 34 digits, the only option would be an OraNumber type, but if it's only used for those very large numbers, it could always be a 24 byte bits type.
Note: if we did want the (hardware supported on IBM platforms, nice since Julia runs on the LE Power), I could restart my attempt to get the decimal C library ported or wrapped to Julia, which supports the packed decimal format (it uses 10-bit triplets to efficiently pack 3 digits at a time, and make it easy for hardware to deal with).
So, a vector of values of type
OraNumber{x}, for a fixed valuex, will have memory and performance issues you're pointing out? Since there's a fixed size, there will be no padding, and also it could be stack-allocated. Or not?
No, if OraNumber{x} were implemented as OraNumber{UInt64}, OraNumber{UInt128}, and OraNumber{UInt192} (where UInt192 is a 24-byte bitstype, just for this purpose), it would have much less memory performance issues than trying to use variable length fields.
I'm trying to explore if there's a way to avoid using a 24bytes type for all numeric values, since the common use case could use much less memory.
Precisely, you'd simply need to look at the precision value that Oracle gives you, to see which type to use for that particular column.
Oh, I just thought of another thing. Maybe this numeric support should not be put into this package, but rather, a package that can be used with ODBC.jl, and another other database wrapper that needs to correctly support database DECIMAL and NUMERIC types.
@ScottPJones I think we agree on the concepts. I just need to study a bit more to choose a solution.
About DECIMAL/NUMERIC, does Oracle DB has them? I mean, this page lists NUMBER and regular float and double. I don't see DECIMAL there. Isn't that for JavaDB? (I never heard of it, just googled).
Ah, I see where I got confused: Oracle OLAP DML has DECIMAL and SHORTDECIMAL that match IEEE,
Oracle DBMS has DECIMAL and NUMBER (which is basically the same as NUMERIC in the standard)
Here is a good description of the ANSI SQL numeric types:
https://academy.vertabelo.com/blog/understanding-numerical-data-types-sql/
In the standard there is a small difference between DECIMAL and NUMERIC, which has to do whether the precision is exact or not (DECIMAL numbers can have >= the specified precision)