ocilib icon indicating copy to clipboard operation
ocilib copied to clipboard

Performance issues with C++ API in multi-threaded environments

Open jalegido opened this issue 2 years ago • 18 comments

Dear vrogier,

We are using this library in an application with a thread pool. We are using different connections to do each task in this thread pool. This means that we have several connections at the same time working in the same database and tables. I would like to use as many connections as parallel tasks we have as the same time. Is this possible? We ensure that each connection is used only in one task at the same moment; but we are seeing that sometimes in release mode application crash. And we don't know if connection is thread safe when we don't use ocilib::Environment::EnvironmentFlags::Type::Threaded.

Best regards

jalegido avatar Mar 21 '22 09:03 jalegido

Hi,

Having multiple connections working on same db and tables is not an issue. In multithreaded applications, you must use ocilib::Environment::EnvironmentFlags::Type::Threaded. This ensure that some internal containers (like the container holding the list of connection objects) are safely accessed from different threads.

Regards,

Vincent

vrogier avatar Mar 21 '22 10:03 vrogier

Hi,

Yes, I saw this flag. But when we activate this flag (threaded), library performance decrease dramatically (x2). I don't know if another solution could be possible.

Best regards

jalegido avatar Mar 21 '22 10:03 jalegido

Hi,

Can you provide more information about the performance decrease due to using OCILIB/OCI in multithreaded mode ? Of course, mutexing some calls have indeed an impact but it should not be x2. Can you provide more info about the connection/db usage by your app ? And also can you provide more information about crashes you mentioned when not using it ? I cannot caution using OCILIB nor OCI library in multithreaded apps without using the threaded flag.

If you want to keep trying not using multithreaded mode, you could try at least to provide your own mechanisms to ensure that connection objects creation and deletion are performed inside a global lock (as OCILIB keeps track of all connections in a list that must be update safely). Might work but I can't provide any guaranties.

Vincent

vrogier avatar Mar 21 '22 10:03 vrogier

ocilib-img1 Hi, I append my profiler result. As image describes, we are calling many times to column destructor. This destructor is cleaning some handleHolder to release the information it has, and this release is calling a mutex section (with a non empty overhead). If this handlheHolder is associated to the resultset and the resultset is always used in the same thread; this overhead is not necesary,

Best regards

jalegido avatar Mar 21 '22 10:03 jalegido

The API has no way to find out if the user code restrains it self to make calls from same thread or not. The bottleneck you see is not in OCILIB C API but rather correlated to the OCILIB C++ API. The C++ API, in order to provide the ability of using objects values (instead or using dynamic allocations) has to record internally allocated handles by the C API and thus, needs a thread safe handle store. To do so, it uses threading primitives provided by the OCILIB C API (it self based on Oracle client mutex primitives). These might not be have the best performance, depending of the platform. At the time OCILIB C++ API was built, C++ 11 was not very popular yet. Thus, I decided at that time, to rely on the threading C primitives. I could make a change in order to use C++11 threading primitive when the compiler supports it. But I need to know if it worths it and will have to check if using C++11 locking primitives for the ConcurrentList and ConcurrentMap has a benefit. Can you provide sample code that show the overhead ? Then, il will be able to see if it worth it.

Vincent

vrogier avatar Mar 21 '22 13:03 vrogier

All that I do is a select with multiple rows (200K rows) and 6 int columns. I understand that ocilib must retaint the used dynamic memory. I propose that dynamic memory should be retained or controlled by different objects (not only enviroment). For example, in this case all this dynamic memory created when you read a column should be managed by Resultset. So this memory will be available while resultset exits. User could read this resultset in different threads, but it must ensure that its application only use resultset in one thread each time. So, in this case is not necesary to lock/unlock in this object array. I think that with this behaviour we could ensure that resultset is thread safety, while user uses it in the same thread at the same time.

Best regards

jalegido avatar Mar 21 '22 13:03 jalegido

While there is no real need, in this case, to have the handles for the columns to be in a global shared pool (and also columns handles are not created on the fly but rather when resultset is created within the C API), there is currently a generic mechanism handled in templated parent class (HandleHolder<>) to handle that (map of with C handles to c++ smart pointers). I will check if can can easily split the global pool of handles into local ones (columns within resultset, statement within connection, connections/pools within environment object, ....) Let me think about it. I will let you know asap.

Vincent

vrogier avatar Mar 21 '22 14:03 vrogier

Thank you very much. I think, this change could be a dramatically performance improvement in the library.

jalegido avatar Mar 21 '22 14:03 jalegido

Hi,

From the profiling on your use case, what was the most expensive locking ? The overhead of the C++ API (over the C one) is the mecanism I implemented 10 years ago for ensuring that a given object value (let's say a ocilib::Column) was tied to the life time of its parent. A concrete example is:

  • execute a statement
  • get a resultset
  • get a colum from that resultset,
  • at this point the column is a valid object
  • then I re-execute the statement
  • at this point the column object sill point to a C OCI_Column handle that, in a meanwhile, became a dandling pointer.
  • So to avoid such situation, there is a internal mechanism that keeps tracks of handles retrieved from the C API and also sub handles retrieved from these handles.
  • In the previous example, when the statement is ree-executed, the internal mechanism is invalidating the resulset object that invalides the column object (e.g., resets C handles, avoiding dandling pointers).
  • And I made this mechanism generic as base classes are templates classes and thus made it mulithreaded friendly.
  • And it logically added a penalty fault when using transient C++ object within same threads.

There are two sides in this mechanism:

  • Global C Handle store that uses locks
  • Tracking logic where C++ objects that have a list of child smart handles (wrappers on C Handles) with parent information that also uses locks

Is the performance fault you're encountering only due to unnecessary locking in the context of object used into a single thread at a given time or also the global store synchronization?

I will try to setup test cases for identifying the penalty fault of these 2 elements.

If it is only the locking of the tracking system, I could add a public method in the base class that allow to configure the locking mode for an object and its children .

Something like :

    resultset.SetLockingMode(ocilib::lockingMode::Off);

    for (int i = 1, n = resultset.GetColumnCount(); i <= n; i++)
    {
        Column col = table.GetColumn(i);

        /*...*/
    }

    resultset.SetLockingMode(ocilib::lockingMode::On);

Avoiding the global handle store locking would require to split in different stores:

  • a global store for object created by applications (connections, pools, lobs, ...)
  • a store per object for its own local children (e.g. columns for resultsets, statements for connections, ....)

This is way more trickier and requires deep change in internal mechanisms.

Vincent

vrogier avatar Sep 14 '22 12:09 vrogier

Hi,

I saw that penalty was in using sinchronization primitives. I don't see the problem in C++ overhead, I think the problem is that we could avoid to use sinchronization API if we store some objects in current resultsets or connections; instead in global store. I assume that one connection only can be used by one thread at the same time. I detected several problems in other database drivers when I used same connection in different threads at the same time. So, if we use the connection in a "thread safe manner" we could store many objects (as columns) in this connection without sinchronization. Because, user must use it in only one thread.

I don't see useful the proposal of SetLockingMode. Because, for me resultset must be used only by one thread at the same time.

Best regards

jalegido avatar Sep 14 '22 14:09 jalegido

Hi,

I'll commit something that you could test in v.4.7.5 branch before the end of the week.

Regards,

Vincent

vrogier avatar Sep 14 '22 16:09 vrogier

Hi,

I made a draft implementation that I committed in v4.7.5 branch. Instead of having a global handle store that is protected when using threaded envs, there are multiple handle stores ad some do use use sync primitives

Let me know if things get better in your use case.

Regards,

Vincent

vrogier avatar Sep 15 '22 12:09 vrogier

the C Handles wrapping and tracking mechanism put in place 10 years, while working well, is not very well designed (kind of clumsy). And also, at that time, pre C++11 support was mandatory. I am planning to rewrite it for v4.8 and drop pre C++11 support. Il will be replaced by a much simpler model based on shared_ptr and more modern implementation.

vrogier avatar Sep 15 '22 13:09 vrogier

Hi,

I have to change file detail/core/SmartHandle.hpp with this include to use as I'm using 4.7.4 version.

#pragma once

#include "ocilibcpp/core.hpp"
#include "ocilibcpp/detail/support/HandleStoreResolver.hpp"

namespace ocilib

Best regards

jalegido avatar Sep 15 '22 14:09 jalegido

Hi,

From my use case I have a 20% dramatically performance improvement. I'm reading from a table with 200K records without where statement. This table has six fields: three integer fields, and three float fields. All fields are read using ocilib.

Thank you very much for your effort

Best regards

jalegido avatar Sep 15 '22 14:09 jalegido

Hi,

Good news :) I will keep up with my testing.

I pushed another commit on the subject.

Regards,

Vincent

vrogier avatar Sep 15 '22 19:09 vrogier

Hi,

Would it be possible for you to provide a sample test code I could use for testing ?

Regards,

Vincent

vrogier avatar Sep 16 '22 11:09 vrogier

Hi, I append next simple example. When I execute with 4.7.5 I see a dramatically performate increase that when I use 4.7.4. Numbers: -------------------- 4.7.4 4.7.5 Time in milliseconds 5590 1857

My test environment is: Windows 10 x64 with Visual Studio 2019 Oracle 12.1.0.2.0

#include <iostream>
#include <memory>
#include <chrono>
#include <ocilib.hpp>

const char* server = "";//ORACLE SERVER 
const char* user = ""; //FILL
const char* pwd = ""; //FILL

//CREATE TABLE sample
//(col_i_0 NUMBER(10, 0),
//	col_i_1 NUMBER(10, 0),
//	col_i_2 NUMBER(10, 0),
//	col_f_3 FLOAT(126),
//	col_f_4 FLOAT(126),
//	col_f_5 FLOAT(126),
//	col_f_6 FLOAT(126)
//	)
//Fill 200K records with random values


int main()
{
    const char* query = "SELECT col_i_0, col_i_1, col_i_2, col_f_3, col_f_4, col_f_5, col_f_6 FROM sample";

    ocilib::Environment::Initialize(ocilib::Environment::EnvironmentFlags::Type::Default | ocilib::Environment::EnvironmentFlags::Type::Threaded);

    std::unique_ptr<ocilib::Connection> conn = std::make_unique<ocilib::Connection>(server, user, pwd);

    auto now1 = std::chrono::steady_clock::now();

    ocilib::Statement stmt(*conn);
    stmt.SetFetchMode(ocilib::Statement::FetchModeValues::FetchForward);
    stmt.SetPrefetchSize(10000);
    stmt.Prepare(query);
    stmt.ExecutePrepared();
    ocilib::Resultset rs = stmt.GetResultset();

    while (rs.Next())
    {
        const ocilib::Column& col0 = rs.GetColumn(1);
        const ocilib::Column& col1 = rs.GetColumn(2);
        const ocilib::Column& col2 = rs.GetColumn(3);
        const ocilib::Column& col3 = rs.GetColumn(4);
        const ocilib::Column& col4 = rs.GetColumn(5);
        const ocilib::Column& col5 = rs.GetColumn(6);
        const ocilib::Column& col6 = rs.GetColumn(7);
    }

    auto now2 = std::chrono::steady_clock::now();

    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(now2 - now1).count() << std::endl;

    conn->Close();
}

jalegido avatar Sep 16 '22 11:09 jalegido