alfresco-bulk-import
alfresco-bulk-import copied to clipboard
Parallelise folder import phase
Currently the tool imports folders serially, in a first phase of import (this allows files to be efficiently batched in the second phase, without having to worry about parent folders - a significant performance improvement over earlier schemes that processed imports folder-by-folder).
Unfortunately because folders are inter-dependent (i.e. you can't import a child folder until the ancestor tree has been imported), parallelising this phase is more difficult than the file case, and was punted in v2.0 of the tool.
By requiring BulkImportSource
s to scan directories breadth-first, some level of parallelisation would become possible during the folder import phase. i.e. the first level folders would be imported serially, then each of those folders' sub-folder trees imported in parallel.
There are worst case corner cases that need some thought (e.g. when there are fewer first-level folders than the optimal number of threads in the thread pool), but in general this should markedly speed up the folder import phase for large folder trees.