codebrowser icon indicating copy to clipboard operation
codebrowser copied to clipboard

Deployment Tips - To other users

Open PiotrZSL opened this issue 4 years ago • 11 comments

As I'm using this Woboq with some big project, here is some tips from me.

Woboq isn't perfect, it's slow, lack of many features (global search, template support, no longer active developed, ...) BUT it works, and speeds up development or issue resolution on some older branches a loot.

Problem - Slow execution Woboq is single thread application and don't waste time of making it multithread (clang itself don't like it), what I did and work for me is: Split compilation_databse.json into (Numer of CPU), and run woboq on every of them in paralel. Then merge html files (pick first found), merge index (merge + sort + uniq), and merge refs (for every duplicated file merge content + uniq + sort). This speed up analisys from 9h to 1h (merge take some time but). Everything done in ram disk (/dev/shm) on server with sufficient amout of memory (this is important).

Problem - Lot of small files Project on with I work takes 22GB (in html files). Moving this to server where it could be serve to users takes lot of time (zip/unzip), and I don't have disk space there anyway to keep all branches. First I tryied with NFS, but when it were sufficient fast, then updating is slow, removing 22GB from NFS takes a lot of time, and there were network issues. Now I changing to something diffrent - squashfs, it takes 2 minutes to build squashfs from that 22GB folder (in ram) on my 52 core server, and after compression it takes ~1GB. That is nice because now I can move it to other server, and publish to users (mount), and removing is also easy, it's only one file to remove, that is important because I wan't users to see up to date version every 1h. It's not so perfect anyway, problem is with refs folder, it's > 128 MB, and cannot be put into squash fs, So in python I split that folder into smaller ones based on crc32(ref file name) % 1000, and I did same in javascript to update request urls. Works perfect.

Problem - No global search I solved this by deploying Hound (with some path to support local folders), and updated javascript and rule in hound.conf, now I can on presing enter search in source files that I also put (after directory mapping apply) into output folder, so in hound I can just add .html to get working search. From time to time hound crash (when changing files) but anyway, this was one day job.

Problem - No blame Because I anyway change output, Decided to add suport for blame. Dont't waste time of editing exist html, it takes ages, instead I create new table in html and save it into separate file, just make sure that style is same so row with will match. Then in javascript load with ajax blame file and just load it in browser.

Hope this helps...

PiotrZSL avatar Mar 19 '20 17:03 PiotrZSL

Nice tips, thank you! @PiotrZSL

gavinchou avatar May 24 '20 02:05 gavinchou

@PiotrZSL Can you please publish the code for search and blame enhancement that you have done?

kanihal avatar Nov 14 '20 10:11 kanihal

@PiotrZSL Can you please publish the code for search and blame enhancement that you have done?

image

Here is my approach to global search, hack the search box:

  1. woboq proxies search request to the search backend service, just like the original engine does, the search box
  2. implement an HTTP service to ack. search request, it can be straightforward, my HTTP server just runs a command like "grep", "the-silver-searcher" or a shell script, just like you search code with command line utils, to search the source or the compiled HTML files, and "return"* stdout/stderr to woboq
  • "return" is actually a browser page reloading to the page the HTTP server response

Check this out to see more: https://github.com/gavinchou/woboq_codebrowser commit 170e34f.

Please let me know if you need source code of the HTTP service. However, I bet you may want to write your own.

How does the search perform depends on how you implement the HTTP search service.

gavinchou avatar Dec 07 '20 08:12 gavinchou

@PiotrZSL Can you please publish the code for search and blame enhancement that you have done?

I cannot share source code, as there are some company copyrights involved. Anyway I did similar thing as gavinchou. First I deployed Hound (https://github.com/hound-search/hound) as independent search engin, on same source code as woboq were run. In hound (from disk - patch needed - can be found in merge requests) configuration you may pass url to "web" (= woboq url). This work as redirect from hound to woboq. For woboq to hound I simply changed "search box" that when I write it search like normal, but when I press enter it redirect to hound.

As for blame, i has separate script that near every .html file generate .blame file with just author and commit id. The in single .json I got commit ids and dates, messages for "tip". Then I just load blame file if exist on page open via javascript and add separate "div" with "table" to show blames. I moved some of things from javascript to lua script on server.

Unfortunetly I did lot of this in work for work, and thats where copyright problem comes...

PiotrZSL avatar Dec 07 '20 16:12 PiotrZSL

To address slow execution (single-threaded) problem, I fix it with multiprocess, check this commit for more details: https://github.com/gavinchou/codebrowser/commit/719e7a7982c0b748157c998dc0293274489e420d

gavinchou avatar Aug 25 '21 07:08 gavinchou

One problem with parallel build is that since the code generator don't use a real database for its output, there could be corruption as several thread or process tries to write in the same files. In order to implement multi output, we need to make sure to have file-level locking:

  • in general, hold a filesystem lock when trying to write to the files in refs
  • before starting processing a file (eg, an header file), it currently check whether the output html file already exist for this file. It should be changed to create an empty file for it at the same time so other process don't try to process the same file (which would result in duplicated uses or other references)

ogoffart avatar Sep 27 '23 06:09 ogoffart

I tried with filesystem locks initially but it made the generation even slower than single process (maybe my approach was bad), so I tried another way: Each generator process appends a unique suffix to the file before writing to it, the suffix is supplied via an environment variable when invoking the generator. After generation of all files finishes, we can combine the output from all the processes into one, remove duplicates etc. This has worked somewhat reliably so far and the performance boost is quite big.

Waqar144 avatar Sep 27 '23 10:09 Waqar144

One problem with parallel build is that since the code generator don't use a real database for its output, there could be corruption as several thread or process tries to write in the same files. In order to implement multi output, we need to make sure to have file-level locking:

  • in general, hold a filesystem lock when trying to write to the files in refs
  • before starting processing a file (eg, an header file), it currently check whether the output html file already exist for this file. It should be changed to create an empty file for it at the same time so other process don't try to process the same file (which would result in duplicated uses or other references)

@ogoffart You are right about the "concurrency issue" of the multiprocess approach. Not only the stdout stderr but also some output files may be written multiple times. I haven't fixed it yet. However, it's usable if the "concurrency" is set to 4 or 8.

gavinchou avatar Jan 01 '24 17:01 gavinchou

I tried with filesystem locks initially but it made the generation even slower than single process (maybe my approach was bad), so I tried another way: Each generator process appends a unique suffix to the file before writing to it, the suffix is supplied via an environment variable when invoking the generator. After generation of all files finishes, we can combine the output from all the processes into one, remove duplicates etc. This has worked somewhat reliably so far and the performance boost is quite big.

@Waqar144 Nice try, can you share both of your approaches of solving the the multiprocess issue? I am wondering why the lock slows down the generation and how you process the suffix.

gavinchou avatar Jan 01 '24 17:01 gavinchou

Its a script available here: https://github.com/KDAB/codebrowser/blob/master/scripts/runner.py

The lock version is lost by now.

Waqar144 avatar Jan 01 '24 18:01 Waqar144

Its a script available here: https://github.com/KDAB/codebrowser/blob/master/scripts/runner.py

The lock version is lost by now.

@Waqar144 Thank you!

gavinchou avatar Jan 02 '24 13:01 gavinchou