ocaml icon indicating copy to clipboard operation
ocaml copied to clipboard

performance issue in Sys and Unix

Open craff opened this issue 1 year ago • 11 comments

Many function in Sys like Sys.file_exists are entering/exiting the blocking section and copying the string filename because it may move. This seems only useful and a slow network file system. The impact on speed is actually not neglectible.

Also, the functions like Sys.file_exists/Sys.is_directory/Sys.is_regular_file will do 3 system calls if we need them all. One function Sys.file_type returning an element of type file_kind = Inexistant | Regular | Diretory | Other could be more efficient.

Many other functions in Sys and Unix could be concerned by this kind of optimisation.

I see two directions:

  • remove the blocking section entering/exiting (but I do not know if the string copy can be avoided on all Oses ?).
  • providing new functions offering better efficiency ?

craff avatar Nov 19 '24 06:11 craff

Before we embark on this kind of work -- not wrong, but it takes effort from somewhere else -- do we have concrete evidence that it is worth optimizing these functions? Do you have a real-world program whose performance is held back by those?

gasche avatar Nov 19 '24 08:11 gasche

Before we embark on this kind of work -- not wrong, but it takes effort from somewhere else -- do we have concrete evidence that it is worth optimizing these functions? Do you have a real-world program whose performance is held back by those?

Yes, I do, I could get a 30% speed increase in simple_httpd on a 1k static file (getting from slower to faster than apache ;-) by removing the blocking section and grouping the three calls (Sys.file_exists,Sys.is_directory,Sys.is_regular`) in one using a small C stub. I will try to produce a small test to separate the effect of blocking section and the effect of grouping 3 functions in one.

The simplest is probably that I create an experimental P.R., optimizing the 3 functions and providing a new one. I can bench against the normal branch and compare three functions/versus 1.

Then, we can discuss the best strategy.

Does this is OK for you ?

craff avatar Nov 19 '24 19:11 craff

One function Sys.file_type returning an element of type file_kind = Inexistant | Regular | Diretory | Other could be more efficient.

It seems that simple_httpd depends on unix, would it be more interesting to use Unix.stat? This function seems to satisfy what you are asking and it could (intuitively) be faster than using the Sys module.

dinosaure avatar Nov 19 '24 20:11 dinosaure

One function Sys.file_type returning an element of type file_kind = Inexistant | Regular | Diretory | Other could be more efficient.

It seems that simple_httpd depends on unix, would it be more interesting to use Unix.stat? This function seems to satisfy what you are asking and it could (intuitively) be faster than using the Sys module.

I use my own C stub ;-)

craff avatar Nov 19 '24 20:11 craff

I did a first benchmark, a function summing the size of all files in a folder (see below). I removed the blocking section and the eventual path allocation (no other optimization) in opendir, closedir, stat and readdir. I get a speedup of 11% to 12%.

Here is the code all the optimized function are in a Unix.Fast sub-module:

let rec loop2 fname =
  let open Fast in
  try
    let st = stat fname in
    let res = ref st.st_size in
    if st.st_kind = S_DIR then
      begin
        let h = opendir fname in
        try
          while true do
            let name = readdir h in
            if name <> "." && name <> ".." then
              res := !res + loop2 (Filename.concat fname name);
          done;
          assert false
        with End_of_file ->
          closedir h
      end;
    !res
  with
    Unix_error(_,_,_) -> 0

craff avatar Nov 20 '24 08:11 craff

I investigated a bit more for Unix.stat and similar which take 60% of the time in the above example. One possibility would be to introduce a custom block of type raw_stats with reading functions in C stubs and provide a function of type raw_stats -> stats. Hence we would have in Unix:

type raw_stats
type stats = THE CURRENT DEF

val raw_stat : string -> raw_stats 
val stat_st_size : raw_stats -> int
val stat_st_size64 : raw_stats -> int64
val stat_st_kind : raw_stats -> file_kind
...

val raw_stats_to_stat : raw_stats -> stats
val stat : string -> stats (written in OCaml using the above)
...

Functions like Sys.is_directory in the Sys module could be rewritten using raw_stats and allocating less data.

I already have the C stubs for this. The C code looks better to me. So if you think this is a good idea, I could easily propose a P.R. The windows case is very similar in this case, so it should not be too much of a problem.

This would be a first step in my long term goal to try to optimize as much as possible the system call from OCaml.

craff avatar Nov 21 '24 19:11 craff

Functions like Sys.is_directory in the Sys module could be rewritten using raw_stats and allocating less data.

In this respect, you need to be aware that the constraints on the Sys module are not the same as for the Unix module. In this case, bootstrapping and the use of OCaml in a more restricted context such as unikernels means that you need to be fairly careful with your C stub.

A relatively interesting area would be to improve the Unix module and perhaps improve the documentation for the Sys module (and explain that the latter is not necessarily designed for an I/O sensitive application).

However, it is difficult to form an opinion without seeing your C stub :slightly_smiling_face:.

dinosaure avatar Nov 22 '24 10:11 dinosaure

Functions like Sys.is_directory in the Sys module could be rewritten using raw_stats and allocating less data.

In this respect, you need to be aware that the constraints on the Sys module are not the same as for the Unix module. In this case, bootstrapping and the use of OCaml in a more restricted context such as unikernels means that you need to be fairly careful with your C stub.

A relatively interesting area would be to improve the Unix module and perhaps improve the documentation for the Sys module (and explain that the latter is not necessarily designed for an I/O sensitive application).

However, it is difficult to form an opinion without seeing your C stub 🙂.

I will submit a P.R. that only change Unix. In fact, this means my P.R. will not improve the Sys functions. Currently, the Sys module is more efficient than Unix module with respect to stat.

craff avatar Nov 22 '24 16:11 craff

The copy is a conversion on Windows (from UTF-8 to UCS-2) which can't be avoided. If the release of the lock is avoided, then the cost of the copy could be removed on Unix, but the code becomes quite a bit more complicated (I think this approach might have been considered in 4.06 when the Windows Unicode support was added)

dra27 avatar Nov 27 '24 14:11 dra27

(I think this approach might have been considered in 4.06 when the Windows Unicode support was added)

I confirm: at the time it was concluded that the advantages of having a single code path outweighed the potential penalty due to the extra copy on Unix.

nojb avatar Nov 27 '24 14:11 nojb

This issue has been open one year with no activity and without being identified as a bug. Consequently, it is being marked with the "stale" label to see if anyone has comments that provide new information on the issue. Is the issue still reproducible? Did it appear in other contexts? How critical is it? Did you miss this feature in a new setting? etc. Note that the issue will not be closed in the absence of new activity: the "stale" label is only used for triaging reason.

github-actions[bot] avatar Dec 01 '25 04:12 github-actions[bot]