PowerSystemDataModel icon indicating copy to clipboard operation
PowerSystemDataModel copied to clipboard

Concurrency issue in CsvSink

Open ckittl opened this issue 4 years ago • 0 comments

It seems, that there is a sort of concurrency issue to the CsvSink. I observe the following mentioned issues, when passing models to CsvSink#persistAllIgnoreNested(Collection<C>).

Observation 1

When passing in several objects of the same type, an warning is raised, that the content will be appended to an already existing file with the targeted name. However, such a file has not been apparent before.

Possible explanation The whole collection of elements is treated concurrently (cf. here). For each element, it is checked, if there is a writer available for the respective class, yet. If not, it is initialized. However, it could occur, that the first element doesn't find a matching writer and starts to initialize one, while at the same time element two of the same class also doesn't see the writer (initialization started, but not finished yet, but a file has already been created), therefore also initializing a second writer. As the writers are registered in a map, the first one is abandoned after writer the first element.

Proposed solution On issuing CsvSink#persistAllIgnoreNested(Collection<C>), split up the given collection into sub collections only holding elements of one class. Those sub collections can be treated concurrently (not the elements within it). The writers are anyways the place, where concurrency ends. Therefore, concurrent attempts to write to the same file would be avoided at a very early stage. @johanneshiry: What's your opinion on that? I think the sink was your concept.

Observation 2

Now and then not all passed elements actually appear in the csv files. Actually I don't have any clue yet, why this is happening. Maybe it's a flushing issue?

Proposed next steps / Questions to answer

  • Trying to evaluate the problem a bit better.
  • Might the calling thread be exited, before the flushing could finish?
  • Think, if the concept under observation could help to mitigate the problem?
    • We flush() after each line, what subverts the concept of a buffered writer. With above concept, it could be called, after all elements of the same class have been written.

ckittl avatar Jul 31 '20 14:07 ckittl