jaydebeapi
jaydebeapi copied to clipboard
Reading is really slow
I've been having trouble getting decent reading times. I've looked into the code and java driver classes and I'm pretty sure this has to do with the type conversion calls, which seems to be really slow when calling Java methods through Python (using jpype)
I'm using this to migrate data from a lot of different technologies (oracle, sybase, mysql, postgres) with some tables exceeding the billion of rows. From my experiences, the reading is about 2 to 3 times slower than the writing. As a comparison, on mysql using mysql-connector, the reading is about 4 times faster.
A good way of speeding this up is to reduce the number of cells to convert. For a table of about 230,000 rows and 4 columns of types int and str, here is a table comparing reading time with SQL aggregation to reduce the number of cells returned:
Method | Reading |
---|---|
Classic (SELECT *) | 11.7s |
CONCAT on all columns | 5.44s |
GROUP_CONCAT on all rows | 0.26s |
Then, I did the type conversion in python in ~500ms. GROUP_CONCAT is hence speeding up reading by a factor 15 but is only available in MYSQL and the improvised python type conversion will become really complicated when some more obscure and technology-dependant types start to appear.
The following points don't speed up the process:
- fetchone(), fetchall() and fetchmany() are equivalent when needing to retrieve all the data (and this can be understood when looking at how they were implemented)
- arraysize, inputsize and outputsize have no influence (the last two are not implemented yet)
I then tried to play with the fetchone() method to speed up the conversion and got the following results (this time the table is 100,000 rows and 4 columns of only INT values):
Method | Reading |
---|---|
Classic fetchone | 12.7s |
fetchone without conversion (getObject directly) | 10.1s |
fetchone with direct conversion (getInt directly) | 3.1s |
As we can see, the type "detection" in java plays a big part in the reading speed. Unfortunately, it is really hard to detect types directly with python.
I've ran out of ideas on how to deal with this low reading speed. I've heard that Jython should give better performance but it hasn't been updated to Python 3 yet..
So if any of the developers here have an idea of whats going on and how this could be solved, I'm more than willing to help!
Here are some more details on my environement:
- Windows 64
- Python 3.7.3 through Anaconda
- JayDeBeAPI 1.1.1