templating
templating copied to clipboard
exclude file still be scanned during instantiate template
it takes a huge time for template engine to scan all the files, and seems not necessary to do it , as I have excluded it explicitly in my template.json
In my local env, the template folder includes a 'node_modules' folder which contains a large numb of files (may be > 100K) ,
below is my template config,
"sources": [
{
"exclude": ["**/node_modules/**", "**/setup.*"]
}
@jianyexi can you give more info what you mean by scan? Is this during install or when instantiating template?
I mean it will open/stat the file during instantiating template
@vlada-shubina & @JanKrivanek this will show up in the "to be triaged" list, as David was removed as assignee.
There are multiple reasons why processing of a folder with large number of files to skip is taking notrivial processing and in short the best advice is to avoid such a scenario.
There are 3 main classes of issues leading to slow processing of tomplates with large amount of files to be skipped. We should fix the first one - this can bring up to 3 folds speedup (more realisticaly it'll half the time). Other 2 might need some prioritisation.
Main classes of reasons why processing takes time:
1) Physically accessing the files that should be skipped. This is a bug and should be fixed. Fixing should be quite easy. Each file is beeing opened and checked 4 times:

This is caused by enumeration of file system entries within FileRenameGenerator.AugmentFileRenames
which under the hood checks if the entry is a directory (here) - that call needs to open the file.
The check for directory is called twice for each file entry (within the neumeration itself and then when wrapping into FileSystemInfo
), and the whole AugmentFileRenames
is called twice - once just to emulate changes and second time to perform the changes. Hence the 4 calls to open each file (including those that should be skipped).
How to fix: Enumeration in FileRenameGenerator
should perform a filtering (similar as e.g. in Orchestrator
) and only entries of interest should be wrapped into custom objects (avoiding the need to check if entry is a directory)
2) Need to evaluate exclusion criteria for each entry.
Templating engine allows rich exclusion/inclusion semantic that leads to a nead to evaluate each single file/directory against the filters to determine whether it should or should not be processed: This logic can be seen e.g. in Orchestrator
Only filenames are being checked - files are not being accessed. But still thousands of records can slow down processing.
How to fix: Skipping of groups of files would require more complicated and involved logic in enumerating (that would traverse the file system recursively while for each directory deciding whether it should be skipped alltogether or visited)
3) Duplicated processing due to separated changes emulation and actual changes performing.
Template engine has 2 steps processing where in first step (RunnableProjectGenerator.GetCreationEffectsAsync
) it performs all of the processing, but just records the results in memory - in order to determine possible destructive changes. If no destructive changes are detected (or if they are allowed with --force
flag) then all processing is performed again (RunnableProjectGenerator.CreateAsync
) and now written to destination.
How to fix: This leads to many duplicated efforts (reading from filesystem; running macros; etc.). There are more effective techniques of performing rollback-enabled filesystem changes (e.g. performing the changes in temporary location first and then copying over to the requested destination). This would however require major changes in template engine processing.
Performance impact of the 3 categories: Based on naive testing with repro case taking ~180 seconds to complete (couple thousands of files, while only 3 are to be processed):
-
Renaming processing (
AugmentFileRenames
) - takes ~61.7% time -
Filtering of files to process (
GetFileChangesInternal
andRunInteranl
) - takes ~36% time - Double processing - each of those functions are executed twice, each execution taking similar amount of time.
We estimated only 1. item from above.
After discussion, it seems that this issue appears only during template authoring. The template package should not include files to be excluded, therefore it unfortunately doesn't meet the bar to be addressed in .NET 7.
Please comment if you feel blocked by this issue.