SOEMPI icon indicating copy to clipboard operation
SOEMPI copied to clipboard

Record link and match pair stat / half stat persistence is unreasonably slow

Open MrCsabaToth opened this issue 11 years ago • 0 comments

SOEMPI can import 10K record set within 5 seconds range and that includes reading the data from the flat file, string tokenization, constructing Person objects, persisting Person objects. For some reason the persistence of record links goes 10-100 times slower (persisting half million record pairs takes about an hour). Although the record pair data is even smaller than a person. What happened so far:

  • from system monitoring I can rule out CPU or IO saturation.
  • SOEMPI long time ago gathers read and write operations into batches. The size is determined by Constants.PAGE_SIZE. This helps to minimize Hibernate flush calls: flush is called only once per PAGE_SIZE
  • Enhanced the system that it won't use the sequence generator when doing mass persistence. In case off mass persistence operations (dataset import, match par stat / half stat persistence, record link persistence) SOEMPI assigns the ids using a simple counter. This can possibly avoid a DB internal select call fro the next sequence number. This affects all persistence though (Person and link too) and didn't bring notable speed change.
  • Changed the textual vector information in PersonLink/person_link from old "text" type to varchar(65536). This was a schema-only change and didn't bring notable improvement.
  • in case of CBF/RBF match there's only one field to match so the binary and continuous vector textual information is redundant, since the weight (double) field already has the info. So in this case I don't generate and persist those.

The main question: why the Person persistence is much faster than the link persistence.

Things to try:

  • Multi-statement insert operations. Currently the PAGE_SIZE minimizes the number of Hibernate flushes. Some articles on the internet talked about actually unifying many INSERT calls into one big statement. This worth a try but I'm sceptical. http://sensiblerationalization.blogspot.com/2011/03/quick-tip-on-hibernate-batch-operation.html
  • Profile the code

MrCsabaToth avatar Jun 30 '13 20:06 MrCsabaToth