I am very willing to have fast tests. This is because if you have tests that take less than a second to run, it is possible to have them run in watch mode, and save your changes and see immediately that your tests are still passing.

With the time however, the tests are running in a bit more than 1 second in watch mode (on my computer, it takes 1940ms on average).

I think this could be improved upon.

First things I would like to do is to measure things. There are separate steps for running the tests :

Load all docx fixtures into nodejs buffers
Unzip the docx with pizzip
Do some templating with docxtemplater
Have some expectations on the output.

Since 1. is probably cached by the "page cache" : https://en.wikipedia.org/wiki/Page_cache

And I expect that 3. should be fast, I'm guessing that 2 && 4 could be improved upon.

Measuring is really the first step, the ideas should not be implemented before doing proper measuring and reporting the results in this issue.

Idea 1 (cached zip)

One of my ideas would be that for tests, we don't use Pizzip to unzip the docx for each test run (which takes time), but instead, we could create a mock that would have a "cached" version of a zip, which would be stored as a folder. For example, if you have in : examples/foo.docx , a docx document is a zipfile which you can unzip. If you unzip that, you get a folder like this :

foo
├── [Content_Types].xml
├── docProps
│   ├── app.xml
│   └── core.xml
├── _rels
└── word
    ├── document.xml
    ├── fontTable.xml
    ├── numbering.xml
    ├── _rels
    │   └── document.xml.rels
    ├── settings.xml
    ├── stylesWithEffects.xml
    ├── styles.xml
    ├── theme
    │   └── theme1.xml
    └── webSettings.xml

We could store this folder instead of foo.docx, and by creating a good mock of Pizzip, we could avoid to unzip the docx for each test run but instead read the files that we want from disk.

Idea 2 (parallel testing)

As far as I can tell, mocha currently runs each test suite sequentially (on one CPU only). Since most computers nowadays have at least 4 or 8 CPUs, it would make sense to run the tests in multiple processes.

May 13 '20 20:05 edi9999

For the change of zip loading, the code that is called a lot is the load function :

function load(name, content, obj) {
         // name is the full file name for example :   "name": "tag-example.docx"
         // content is the content of that filename, which is a binary string
        // obj is a cache of each docxtemplater instance that was created using load.
	const zip = new PizZip(content);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}

Since this load function calls new PizZip(content), if new PizZip is slow, then by changing PizZip to something faster would make the tests run faster.

It would be interesting to see, for a test run, how much time is spent running new Pizzip(content) here.

May 21 '20 15:05 edi9999

If the measurements make it clear that PizZip is slow, it would make sense to write basically :

const zip = new MockedZip(name);

and the MockedZip would "behave" exactly as the PizZip library, except that underneath, it uses a folder which is already unzipped.

May 21 '20 15:05 edi9999

The two ideas I had were just a starting point, probably by using a profiler we could find which spots cause the slowlessness.

May 21 '20 15:05 edi9999

You can now to use npm run test:watch (after pulling since I just pushed this).

May 21 '20 15:05 edi9999

Benchmark Results

method for this process.hrtime();

load function mentioned in this discussion is called total 402 times for complete test using command npm test. And it has taken an average of 0.612358 seconds for almost 10 times running it. It varies from 0.742381 seconds to 0.49462005 seconds for my computer which i5 third generation. Benchmark Results Pic1

git diff was not showing the console.log part of code so included screenshot showing the comparing. Left side window has old code and right side window have new code to benchmark allload function calls. benchmark-code-changes Benchmark Results Pic2

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls:",
		loadFunctionCallFrequency
	);
	return obj[name];
}

Benchmark Results Code Block1

My Environment: NodeV12.16.1 NPMv6.13.4

For the change of zip loading, the code that is called a lot is the load function :
function load(name, content, obj) {
         // name is the full file name for example :   "name": "tag-example.docx"
         // content is the content of that filename, which is a binary string
        // obj is a cache of each docxtemplater instance that was created using load.
	const zip = new PizZip(content);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}
Since this load function calls new PizZip(content), if new PizZip is slow, then by changing PizZip to something faster would make the tests run faster.

It would be interesting to see, for a test run, how much time is spent running new Pizzip(content) here.

May 21 '20 20:05 shadab14meb346

Can you do the same measurements with npm run test:es6:fast ?

Here I suspect that you are running all tests (even the ones we know take more than 2 or 3 seconds).

When I run npm run test:es6:fast it runs 288 tests.

Can you do the same measurement just for new PizZip, ie not include the new Docxtemplater ?

May 21 '20 20:05 edi9999

Can you do the same measurements with npm run test:es6:fast ?

Here I suspect that you are running all tests (even the ones we know take more than 2 or 3 seconds).

When I run npm run test:es6:fast it runs 288 tests.

Can you do the same measurement just for new PizZip, eg not include the new Docxtemplater ?

Benchmark Results

by using test command npm run test:es6:fast. For both new PizZip and new Docxtemplater

Average time taken 0.5015457105 seconds for 10 test runs.

IndividualTimesInSeconds:0.493444716,0.532960591,0.518716382,0.506935504,0.497533393,0.491868178,0.47420252,0.523161315,0.497780322,0.478854184

Code block for this test is exactly same as this reply Benchmark Results Code Block1
Kazam_screenshot_00005 2. As asked by you

Can you do the same measurement just for new PizZip, eg not include the new Docxtemplater ?

By this what I am able to understand is two things.

2.1 Do benchmarking for load function just with new PizZip. If this is the case below code block I have used to benchmark it. But as expected it is failing the test cases, it is giving error TypeError: obj[name].loadZip is not a function and I am also not getting any time console.

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function with only PizZip total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls with only PizZip:",
		loadFunctionCallFrequency
	);
	// obj[name] = new Docxtemplater();
	// obj[name].loadZip(zip);
	// obj[name].loadedName = name;
	// obj[name].loadedContent = content;
	return obj[name];
}

Benchmark Results Code Block2

2.2 And case two that I should benchmark new PizZip standalone by taking a file as an example benchmarkPizZip.js file and require PizZip in that and run node benchmarkPizZip.js as a normal node process. The expected code for benchmarkPizZip.js is below. I tried this way and for the file cyrillic.docx in the examples folder the average timings in seconds is 0.0144781315 for running 5 times.

const PizZip = require("pizzip");
const fs = require("fs");
const path = require("path");
//Load the docx file as a binary
const content = fs.readFileSync(
	path.resolve(__dirname, "../../examples/cyrillic.docx"),
	"binary"
);
let totalCallTimeInNanoSeconds = 0;
let start = process.hrtime();
const zip = new PizZip(content);
start = process.hrtime(start);
totalCallTimeInNanoSeconds = start[1];
console.log(
	`In calling PizZip time taken in seconds:`,
	totalCallTimeInNanoSeconds / 1e9
);

Benchmark Results Code Block3

May 21 '20 22:05 shadab14meb346

I meant measuring the time of new Pizzip, but still keep everything as is. So almost 2.1, but without the comments :

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function with only PizZip total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls with only PizZip:",
		loadFunctionCallFrequency
	);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}

May 22 '20 06:05 edi9999

I meant measuring the time of new Pizzip, but still keep everything as is. So almost 2.1, but without the comments :

let loadFunctionCallFrequency = 0;
let totalCallTimeInNanoSeconds = 0;
function load(name, content, obj) {
	let start = process.hrtime();
	loadFunctionCallFrequency++;
	const zip = new PizZip(content);
	start = process.hrtime(start);
	totalCallTimeInNanoSeconds += start[1];
	console.log(
		`In calling all ${loadFunctionCallFrequency} times load function with only PizZip total time taken in seconds:`,
		totalCallTimeInNanoSeconds / 1e9
	);
	console.log(
		"Total number of load function calls with only PizZip:",
		loadFunctionCallFrequency
	);
	obj[name] = new Docxtemplater();
	obj[name].loadZip(zip);
	obj[name].loadedName = name;
	obj[name].loadedContent = content;
	return obj[name];
}

Ya sure now I understood. Here are the results just for PizZip. I ran it 5 times. test command npm run test:es6:fast Average time:0.2619314742 seconds Kazam_screenshot_00006

May 22 '20 06:05 shadab14meb346

Ok, so it means 0.26 seconds of improvement for a total run time of about 2 seconds.

I think in this case, it doesn't make much sense to try to optimize it, since it won't be possible to do an improvement of more than 10%.

Can you try to profile the test code to see if there are some bottlenecks that could be made faster ?

Else, I think it would maybe be faster by parallelizing, I'm not sure how this is doable with mocha.

May 22 '20 14:05 edi9999

For my computer the test with this command npm run test:es6:fast is taking total 3 seconds. Currently I am looking into how to profile test cases code. Later I will look into parallel testing in mocha.

May 22 '20 14:05 shadab14meb346

I was just giving an initial try for parallel testing idea and I installed mocha-parallel-tests npm module and tried running our test cases but it is taking even more time as compared to npm run test:es6:fast. So some issue is there with the mocha-parallel-tests module. Continuing my search for test cases code profiling but could not something useful yet.

May 22 '20 18:05 shadab14meb346

When you run mocha-parallel-tests, was it with the FAST=true environment variable ?

The command run by npm run test:es6:fast is :

"test:es6:fast": "FAST=true mocha --full-trace --check-leaks es6/tests/index.js",

If you use mocha-parallel-tests, you should use the same env to be able to compare.

May 22 '20 19:05 edi9999

Yes I am using the same env to compare both.

"parallel-test": "FAST=true mocha-parallel-tests --full-trace --check-leaks es6/tests/index.js"
 "test:es6:fast": "FAST=true mocha --full-trace --check-leaks es6/tests/index.js"

May 22 '20 19:05 shadab14meb346

After using nodejs inspector with this command FAST=true ./node_modules/.bin/mocha --inspect-brk --full-trace --check-leaks es6/tests/index.js. And doing a CPU profiling. Below results I deduced. Functions taking more than 500ms time

LoadFile file-path://docxtemplater/es6/tests/utils.js
asText in PizZip file-path://docxtemplater/node_modules/pizzip/js/object.js
nodebuffer file-path://docxtemplater/node_modules/pizzip/js/utils.js
load file-path://docxtemplater/es6/tests/utils.js
render file-path://docxtemplater/es6/docxtemplater.js
ZipEntries in PizZip file-path://docxtemplater/node_modules/pizzip/js/zipEntries.js
dataToString in PizZip file-path://docxtemplater/node_modules/pizzip/js/object.js
compile file-path://docxtemplater/es6/docxtemplater.js
makeDocx file-path://docxtemplater/es6/tests/utils.js
module.exports in PizZip file-path://docxtemplater/node_modules/pizzip/js/load.js
PizZip file-path://docxtemplater/node_modules/pizzip/js/index.js
load file-path://docxtemplater/node_modules/pizzip/js/zipEntries.js
readLocalFiles in PizZip file-path://docxtemplater/node_modules/pizzip/js/zipEntries.js
module.exports file-path://docxtemplater/node_modules/pizzip/js/nodeBuffer.js
exports.TransformTo file-path://docxtemplater/node_modules/pizzip/js/utils.js

As you can see a lot of functions which are taking more time are associated with PizZip and I guess if we can moc PizZip as discussed by you here maybe we can increase the test speed.

May 30 '20 00:05 shadab14meb346

Yes, I think you're right, lets give it a go then.

May 30 '20 07:05 edi9999

I will start looking into it how to write a mock for PizZip. Please do share any suggestions if you feel so.

May 30 '20 07:05 shadab14meb346

Hello @shadab14meb346 ,

the idea is to mock PizZip, meaning that it should behave in the same way as the PizZip library but be as fast as possible.

For example, one should be able to do :

var zip = new MockedPizzip(filepath);
zip.file(/./).map(({ name }) => {
    // this should print all filenames of the faked zip
    console.log(name);
});

const content = zip.files["word/document.xml"].asText();
// This should log the content of word/document.xml
console.log(content);

const newContent = "foobar";
// This should change the content of word/document.xml
zip.file("word/document.xml", newContent, { createFolders: true });

May 31 '20 15:05 edi9999

Does this make things more clear ?

Jun 02 '20 07:06 edi9999

Hi. Ya, sure thanks. it is helping me. So currently I am trying to understand PizZip working. For this, 1. I made a testZip.zip zip file using PizZip. The folder unzips as below

testZip
├── Hello.txt
├── documents
        ├── users.docx

2. Now I am trying to use the step onetestZip.zipzip file and read the file names by using this code block

const PizZip = require("pizzip");
const path = require("path");
const zippedFilePath = path.join(__dirname, "/testZip.zip");
const zip = new PizZip(zippedFilePath);
zip.file(/./).map(({ name }) => {
    // this should print all filenames of the faked zip
    console.log(name);
});

But this is giving me the below error

docxtemplater/node_modules/pizzip/js/zipEntries.js:191
throw new Error("Can't find end of central directory : is this a zip file ?");
        ^
Error: Can't find end of central directory : is this a zip file ?```
Can you suggest something, please?

Jun 02 '20 08:06 shadab14meb346

I think it is because when using a real pizzip instance, you need to use the file contents, not the file path.

IE :

const zippedFilePath = path.join(__dirname, "/testZip.zip");
const content = fs.readFileSync(zippedFilePath).toString()
const zip = new PizZip(content);

Jun 02 '20 13:06 edi9999

I think it is because when using a real pizzip instance, you need to use the file contents, not the file path.

IE :
const zippedFilePath = path.join(__dirname, "/testZip.zip");
const content = fs.readFileSync(zippedFilePath).toString()
const zip = new PizZip(content);

It got worked but in the place of const content = fs.readFileSync(zippedFilePath).toString() line it should be like const content = fs.readFileSync(zippedFilePath). So the final working code block will be like below

const zippedFilePath = path.join(__dirname, "/testZip.zip");
const content = fs.readFileSync(zippedFilePath);
const zip = new PizZip(content);

Jun 02 '20 15:06 shadab14meb346

I am very willing to have fast tests. This is because if you have tests that take less than a second to run, it is possible to have them run in watch mode, and save your changes and see immediately that your tests are still passing.

With the time however, the tests are running in a bit more than 1 second in watch mode (on my computer, it takes 1940ms in average).

I think this could be improved upon.

First things I would like to do is to measure things. They are separate steps for running the tests :

Load all docx fixtures into nodejs buffers

Unzip the docx with pizzip

Do some templating with docxtemplater

Have some expectations on the output.

Since 1. is probably cached by the "page cache" : https://en.wikipedia.org/wiki/Page_cache

And I expect that 3. should be fast, I'm guessing that 2 && 4 could be improved upon.

Measuring is really the first step, the ideas should not be implemented before doing proper measuring and reporting the results in this issue.

Idea 1 (cached zip)

One of my ideas would be that for tests, we don't use Pizzip to unzip the docx for each test run (which takes time), but instead, we could create a mock that would have a "cached" version of a zip, which would be stored as a folder. For example, if you have in : examples/foo.docx , a docx document is a zipfile which you can unzip. If you unzip that, you get a folder like this :
foo
├── [Content_Types].xml
├── docProps
│   ├── app.xml
│   └── core.xml
├── _rels
└── word
    ├── document.xml
    ├── fontTable.xml
    ├── numbering.xml
    ├── _rels
    │   └── document.xml.rels
    ├── settings.xml
    ├── stylesWithEffects.xml
    ├── styles.xml
    ├── theme
    │   └── theme1.xml
    └── webSettings.xml
We could store this folder instead of foo.docx, and by creating a good mock of Pizzip, we could avoid to unzip the docx for each test run but instead read the files that we want from disk.

Idea 2 (parallel testing)

As far as I can tell, mocha currently runs each test suite sequentially (on one CPU only). Since most computers nowadays have at least 4 or 8 CPUs, it would make sense to run the tests in multiple processes.

I tried to understand but I am facing a little difficulty so can you please help me In the Idea 1 (cached zip) as you mentioned

For example, if you have in : examples/foo.docx , a docx document is a zipfile which you can unzip. If you unzip that, you get a folder like this :

Can you please point me out the folder and line number where the unzipping of docx file is happening for the test cases. And the folder and line number where PizZip is using the unzipped files to do all sort of things like below.

var zip = new MockedPizzip(filepath);
zip.file(/./).map(({ name }) => {
    // this should print all filenames of the faked zip
    console.log(name);
});

const content = zip.files["word/document.xml"].asText();
// This should log the content of word/document.xml
console.log(content);

const newContent = "foobar";
// This should change the content of word/document.xml
zip.file("word/document.xml", newContent, { createFolders: true });

Jun 11 '20 16:06 shadab14meb346

The zip creation is done here :

 docxtemplater/es6/tests/utils.js line 403
	const zip = new PizZip(content);

The usage of the zip is done at several places, for example :

docxtemplater/es6/collect-content-types.js line 15 
docxtemplater/es6/docxtemplater.js line 45
docxtemplater/es6/docxtemplater.js line 242
docxtemplater/es6/docxtemplater.js line 342
docxtemplater/es6/docxtemplater.js line 352
docxtemplater/es6/docxtemplater.js line 363

In all these cases, the zip instance created in docxtemplater/es6/tests/utils.js line 403 is the one that is going to be used in all other places, what I mean by that is that it is the exact same object.

Jun 11 '20 19:06 edi9999

@shadab14meb346 are you alright ? I think I will tackle this issue soon if I don't get a response in the next few days.

Jul 30 '20 20:07 edi9999

docxtemplater
docxtemplater copied to clipboard

Make tests run faster

Idea 1 (cached zip)

Idea 2 (parallel testing)

Benchmark Results

Benchmark Results

Idea 1 (cached zip)

Idea 2 (parallel testing)

docxtemplater docxtemplater copied to clipboard

Make tests run faster

Idea 1 (cached zip)

Idea 2 (parallel testing)

Benchmark Results

Benchmark Results

Idea 1 (cached zip)

Idea 2 (parallel testing)

docxtemplater
docxtemplater copied to clipboard