alfresco-bulk-import icon indicating copy to clipboard operation
alfresco-bulk-import copied to clipboard

Parallelise folder import phase

Open pmonks opened this issue 6 years ago • 0 comments

Currently the tool imports folders serially, in a first phase of import (this allows files to be efficiently batched in the second phase, without having to worry about parent folders - a significant performance improvement over earlier schemes that processed imports folder-by-folder).

Unfortunately because folders are inter-dependent (i.e. you can't import a child folder until the ancestor tree has been imported), parallelising this phase is more difficult than the file case, and was punted in v2.0 of the tool.

By requiring BulkImportSources to scan directories breadth-first, some level of parallelisation would become possible during the folder import phase. i.e. the first level folders would be imported serially, then each of those folders' sub-folder trees imported in parallel.

There are worst case corner cases that need some thought (e.g. when there are fewer first-level folders than the optimal number of threads in the thread pool), but in general this should markedly speed up the folder import phase for large folder trees.

pmonks avatar Mar 15 '18 17:03 pmonks