box2 icon indicating copy to clipboard operation
box2 copied to clipboard

Long build times

Open basz opened this issue 10 years ago • 18 comments

Hi,

I haven't use box much but did try to package some zendframework applications with it. I noticed (via running it verbose) that as more files are included took a little bit longer per file to do so. And in the end it was at crawling speed. Took about 1.5 hour to a medium sized project. (a few thousand files)

I noticed that the resulting phar file that is being written to kept changing file size while the build took place. Is this a known issue with a work around?

Could it be related to https://scrutinizer-ci.com/blog/composer-gc-performance-who-is-affected-too ?

Bas

basz avatar Jan 17 '15 10:01 basz

Observed exactly the same behaviour. This is too slow to be usable in practice. This is despite disabling compression and compacting. Running with php -d zend.enable_gc=0 did not help markedly if at all.

Bilge avatar Feb 11 '15 17:02 Bilge

Only ever used it on small items, like CLI apps. Was file size growing constantly, or reducing and then growing back up to slightly bigger each time?

padraic avatar Feb 22 '15 01:02 padraic

Hi @padraic, the latter. File size goes from zero to bigger constantly, it looks like the phar is being rewritten every time a file is added. Box is unusable for me on these kind of projects, so for now rolling my own phar building, which still takes 6-8 minutes to complete.

basz avatar Feb 22 '15 15:02 basz

When you run box build, the entire archive is rebuilt from scratch.

Unfortunately, incremental building isn't supported in this version of box, but it is planned for the next version (#84).

kherge avatar Mar 04 '15 16:03 kherge

Incremental building does not sound like it tackles the core problem here even though it may improve performance in a different way.

Bilge avatar Mar 04 '15 20:03 Bilge

I am not doing an incremental build here... Shall I create a short movie demonstrating the issue?

Op 4 mrt. 2015 om 17:17 heeft Kevin Herrera [email protected] het volgende geschreven:

When you run box build, the entire archive is rebuilt from scratch.

Unfortunately, incremental building isn't supported in this version of box, but it is planned for the next version (#84).

— Reply to this email directly or view it on GitHub.

basz avatar Mar 04 '15 21:03 basz

Sure, but I think I might understand the issue. As you build the *.phar file, you notice the file size fluctuating as the build is running, is that correct? I think zip does the same thing. I'm guessing this is just how the phar extension is choosing to build the phar file, which I have no control over. Box itself only removes and recreates the phar once in the very beginning of the process.

kherge avatar Mar 04 '15 21:03 kherge

Yes correct, that's the issue... However I don't think zip does behave in a similar fashion - why would it - as it is perfectly possible to just append data to files. It -feels- like an incompatible configuration option somewhere, because I can't be the first one to try this including large libraries such as zf2 or symfony can I? Anyway, i was shopping someone would recognize a quick solution - not really a problem for me as at this time I was only doing unpaid experimentations. thanks

basz avatar Mar 04 '15 21:03 basz

Same problem here, mid size ZF2 + Doctrine project takes about 1,5 hours to build. I really love the simplicity of box, but this makes it useless for me.

It seems that Phing's PharTask had the same problem (https://www.phing.info/trac/ticket/782), they solved it by using Phar::buildFromIterator() which should massively speed up phar generation.

arjanvdbos avatar Apr 05 '15 09:04 arjanvdbos

@arjanvdbos That seems like some pertinent information. I hope @kherge is taking note.

Bilge avatar Apr 05 '15 13:04 Bilge

The bug ticket mentions that they switched from a one-by-one approach to buildFromIterator(). The one-by-one approach is required by Box in order to process each individual file. While we may likely see a boost in performance, we won't be able to make any changes to those files.

I can probably make this work, but you may only see the performance improvement when none of the file processors available to Box are used.

kherge avatar Apr 06 '15 15:04 kherge

I'm not familiar with the steps box takes, but it might work and still be a lot faster if you copy and process files first to a tmp dir and then import them with a buildFromIterator.

Op 6 apr. 2015 om 17:12 heeft Kevin Herrera [email protected] het volgende geschreven:

The bug ticket mentions that they switched from a one-by-one approach to buildFromIterator(). The one-by-one approach is required by Box in order to process each individual file. While we may likely see a boost in performance, we won't be able to make any changes to those files.

I can probably make this work, but you may only see the performance improvement when none of the file processors available to Box are used.

— Reply to this email directly or view it on GitHub.

basz avatar Apr 06 '15 15:04 basz

Hah, I like your idea much better!

The current process is something like this:

for each file listed in box.json
    read its contents into memory
    process the contents
    add it to the phar using addFromString()
    move on to next file

The new one will look like this:

for each file listed in box.json
    copy it to a temporary directory
    process the contents
    move on to next file

add the temporary directory using buildFromIterator()

kherge avatar Apr 06 '15 15:04 kherge

Good to see this issue is reopened!

I didn't look at the internal working of Box, but in my opinion it is not required to copy and process the files first, and then generate the Phar. You could also implement a sort of collection class which implements a Iterator with a callback. In the callback you can handle all the pocessing stuf.

arjanvdbos avatar Apr 06 '15 18:04 arjanvdbos

Hi,

I wanted to reduce my build times from 325 seconds a bit :) So I built two branches for both box2 and box2-lib

https://github.com/webdevvie/box2/tree/faster-adding-with-excluded-regexp https://github.com/webdevvie/box2-lib/tree/faster-adding-with-excluded-regexp

It adds a config value to the box.json file for box2: "exclude-from-value-replace":[ "/^vendor(.*)/i", "/^src(.*)/i" ]

and uses an ArrayItterator to add the files from a file queue to the phar file.

This reduced the build time for my project to 10 seconds. Which is more than acceptable for me.

I hope this is useful to you. I ran the unit tests for box2, and they are green. But the box2-lib unittests throw a warning about trying to read a fopen(/does/not/exist) but the tests are still green.

Hope this helps out.

My branches: https://github.com/webdevvie/box2/tree/faster-adding-with-excluded-regexp https://github.com/webdevvie/box2-lib/tree/faster-adding-with-excluded-regexp

webdevvie avatar Jun 06 '16 12:06 webdevvie

Anyone working to get this feature to main build? @webdevvie How can I use your version?

kalehrishi avatar Nov 14 '17 13:11 kalehrishi

@kalehrishi I added the specific fork and branches to my composer.json

i added this to my require array:

        "kherge/box": "dev-faster-adding-with-excluded-regexp as 2.7.2",
        "herrera-io/box": "dev-faster-adding-with-excluded-regexp as 1.6",

then added this:

"repositories": [
        {
          "type": "vcs",
          "url": "[email protected]:webdevvie/box2.git"
        },
        {
          "type": "vcs",
          "url": "[email protected]:webdevvie/box2-lib.git"
        }
    ],

until the branches are merged or some other solution is added.

There are several open pull requests for both projects for over a year now so I think this project is in the phantom zone or something.

webdevvie avatar Nov 14 '17 15:11 webdevvie

@webdevvie Thanks a lot.

kalehrishi avatar Nov 15 '17 04:11 kalehrishi