ford icon indicating copy to clipboard operation
ford copied to clipboard

Large html size

Open d7919 opened this issue 6 years ago • 5 comments

This is a followup to #205.

Whilst removing the list of modules, files etc. has helped reduce the overall size of the generated documentation still remains relatively large.

I'm not very familiar with html development so it may just be that my expectations are unrealistic but for a project with approximately 100k lines the generated documentation is around 80MB with graphs = false, source = false and search = false. The majority of the usage seems to be in the files in proc (around 3000 procedures = ~ 30MB) and sourcefiles (151 files =~ 30MB).

I was able to shrink things by around 10% with htmlmin (by running for f in $(find ./html -name '*.html') ; do htmlmin ${f} ${f} ; done) although this was quite slow.

Are there any other strategies that could be used to reduce space requirements?

d7919 avatar Feb 13 '19 16:02 d7919

I think we need a "profile" of where space is being used up in your program. If you're on macOS you could use a program like DaisyDisk to see which directories are hogging disk space, and then we can work backwards from there. I also wonder if there are JS and CSS assets that we can minify during document generation, and images that could be compressed.

zbeekman avatar Apr 25 '19 20:04 zbeekman

I got the following:

du -csh docs/html/*
4.0K	docs/html/blockdata
296K	docs/html/css
16K	docs/html/favicon.png
672K	docs/html/fonts
124K	docs/html/index.html
2.6M	docs/html/interface
228K	docs/html/js
772K	docs/html/lists
980K	docs/html/media
7.2M	docs/html/module
12K	docs/html/page
48M	docs/html/proc
596K	docs/html/program
8.0K	docs/html/search.html
29M	docs/html/sourcefile
5.4M	docs/html/src
2.0M	docs/html/type
97M	total

We can see that the proc and sourcefile directories are the largest here by a good margin.

The relevant config settings are

display: public
display: protected
display: private
source: true
search: false
graph: false

We could clearly reduce the size of these two directories by turning off source and some of the display options, but it would be nice to avoid this if possible. In case it's useful here's how many files are in each directory

docs/html/css/ 6
docs/html/fonts/ 8
docs/html/interface/ 127
docs/html/js/ 5
docs/html/lists/ 5
docs/html/media/ 11
docs/html/module/ 124
docs/html/page/ 1
docs/html/proc/ 2618
docs/html/program/ 17
docs/html/sourcefile/ 129
docs/html/src/ 129
docs/html/type/ 79

Finally if I use tar -cvzf to produce a tar.gz of the html folder this results in a tarball that is about 8 MB, which may not be useful information.

For reference this documentation can be seen at https://gyrokinetics.gitlab.io/gs2/

d7919 avatar Apr 26 '19 07:04 d7919

In the previous issue the main problem seemed to be a large section of code that was duplicated between a large number of the files. I believe the solution implemented was to just remove this from everything, although I believe there's another approach that can allow duplicated code/sections to be included where needed, ensuring there's only one version of each section of code. I'm not sure if that is applicable here but it might be (e.g. for the heading bar, side bar etc.).

d7919 avatar Apr 26 '19 07:04 d7919

Yes, beyond minification and and compression, sharing more assets and html between pages should help.

That info is very useful, even the tgz file size which shows that there is about an order of magnitude space savings to be had.

Thanks for this profiling.

On Fri, Apr 26, 2019 at 3:41 AM David Dickinson [email protected] wrote:

In the previous issue the main problem seemed to be a large section of code that was duplicated between a large number of the files. I believe the solution implemented was to just remove this from everything, although I believe there's another approach that can allow duplicated code/sections to be included where needed, ensuring there's only one version of each section of code. I'm not sure if that is applicable here but it might be (e.g. for the heading bar, side bar etc.).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Fortran-FOSS-Programmers/ford/issues/272#issuecomment-486960492, or mute the thread https://github.com/notifications/unsubscribe-auth/AACEIPAZX7OVXWRR6GIZBATPSKW4DANCNFSM4GXHACFA .

zbeekman avatar Apr 26 '19 10:04 zbeekman