performance issue in Sys and Unix
Many function in Sys like Sys.file_exists are entering/exiting the blocking section and copying the string filename because it may move. This seems only useful and a slow network file system. The impact on speed is actually not neglectible.
Also, the functions like Sys.file_exists/Sys.is_directory/Sys.is_regular_file will do 3 system calls if we need them all.
One function Sys.file_type returning an element of type file_kind = Inexistant | Regular | Diretory | Other could be more efficient.
Many other functions in Sys and Unix could be concerned by this kind of optimisation.
I see two directions:
- remove the blocking section entering/exiting (but I do not know if the string copy can be avoided on all Oses ?).
- providing new functions offering better efficiency ?
Before we embark on this kind of work -- not wrong, but it takes effort from somewhere else -- do we have concrete evidence that it is worth optimizing these functions? Do you have a real-world program whose performance is held back by those?
Before we embark on this kind of work -- not wrong, but it takes effort from somewhere else -- do we have concrete evidence that it is worth optimizing these functions? Do you have a real-world program whose performance is held back by those?
Yes, I do, I could get a 30% speed increase in simple_httpd on a 1k static file (getting from slower to faster than apache ;-) by removing the blocking section and grouping the three calls (Sys.file_exists,Sys.is_directory,Sys.is_regular`) in one using a small C stub. I will try to produce a small test to separate the effect of blocking section and the effect of grouping 3 functions in one.
The simplest is probably that I create an experimental P.R., optimizing the 3 functions and providing a new one. I can bench against the normal branch and compare three functions/versus 1.
Then, we can discuss the best strategy.
Does this is OK for you ?
One function Sys.file_type returning an element of type file_kind = Inexistant | Regular | Diretory | Other could be more efficient.
It seems that simple_httpd depends on unix, would it be more interesting to use Unix.stat? This function seems to satisfy what you are asking and it could (intuitively) be faster than using the Sys module.
One function Sys.file_type returning an element of type file_kind = Inexistant | Regular | Diretory | Other could be more efficient.
It seems that
simple_httpddepends onunix, would it be more interesting to useUnix.stat? This function seems to satisfy what you are asking and it could (intuitively) be faster than using theSysmodule.
I use my own C stub ;-)
I did a first benchmark, a function summing the size of all files in a folder (see below).
I removed the blocking section and the eventual path allocation (no other optimization) in opendir, closedir, stat and readdir. I get a speedup of 11% to 12%.
Here is the code all the optimized function are in a Unix.Fast sub-module:
let rec loop2 fname =
let open Fast in
try
let st = stat fname in
let res = ref st.st_size in
if st.st_kind = S_DIR then
begin
let h = opendir fname in
try
while true do
let name = readdir h in
if name <> "." && name <> ".." then
res := !res + loop2 (Filename.concat fname name);
done;
assert false
with End_of_file ->
closedir h
end;
!res
with
Unix_error(_,_,_) -> 0
I investigated a bit more for Unix.stat and similar which take 60% of the time in the above example. One possibility would be to introduce a custom block of type raw_stats with reading functions in C stubs and provide a function of type raw_stats -> stats. Hence we would have in Unix:
type raw_stats
type stats = THE CURRENT DEF
val raw_stat : string -> raw_stats
val stat_st_size : raw_stats -> int
val stat_st_size64 : raw_stats -> int64
val stat_st_kind : raw_stats -> file_kind
...
val raw_stats_to_stat : raw_stats -> stats
val stat : string -> stats (written in OCaml using the above)
...
Functions like Sys.is_directory in the Sys module could be rewritten using raw_stats and allocating less data.
I already have the C stubs for this. The C code looks better to me. So if you think this is a good idea, I could easily propose a P.R. The windows case is very similar in this case, so it should not be too much of a problem.
This would be a first step in my long term goal to try to optimize as much as possible the system call from OCaml.
Functions like Sys.is_directory in the Sys module could be rewritten using raw_stats and allocating less data.
In this respect, you need to be aware that the constraints on the Sys module are not the same as for the Unix module. In this case, bootstrapping and the use of OCaml in a more restricted context such as unikernels means that you need to be fairly careful with your C stub.
A relatively interesting area would be to improve the Unix module and perhaps improve the documentation for the Sys module (and explain that the latter is not necessarily designed for an I/O sensitive application).
However, it is difficult to form an opinion without seeing your C stub :slightly_smiling_face:.
Functions like Sys.is_directory in the Sys module could be rewritten using raw_stats and allocating less data.
In this respect, you need to be aware that the constraints on the Sys module are not the same as for the Unix module. In this case, bootstrapping and the use of OCaml in a more restricted context such as unikernels means that you need to be fairly careful with your C stub.
A relatively interesting area would be to improve the Unix module and perhaps improve the documentation for the Sys module (and explain that the latter is not necessarily designed for an I/O sensitive application).
However, it is difficult to form an opinion without seeing your C stub 🙂.
I will submit a P.R. that only change Unix. In fact, this means my P.R. will not improve the Sys functions. Currently, the Sys module is more efficient than Unix module with respect to stat.
The copy is a conversion on Windows (from UTF-8 to UCS-2) which can't be avoided. If the release of the lock is avoided, then the cost of the copy could be removed on Unix, but the code becomes quite a bit more complicated (I think this approach might have been considered in 4.06 when the Windows Unicode support was added)
(I think this approach might have been considered in 4.06 when the Windows Unicode support was added)
I confirm: at the time it was concluded that the advantages of having a single code path outweighed the potential penalty due to the extra copy on Unix.
This issue has been open one year with no activity and without being identified as a bug. Consequently, it is being marked with the "stale" label to see if anyone has comments that provide new information on the issue. Is the issue still reproducible? Did it appear in other contexts? How critical is it? Did you miss this feature in a new setting? etc. Note that the issue will not be closed in the absence of new activity: the "stale" label is only used for triaging reason.