steep Large number of files slows down the checking process.

I have read and attempted to steep check a large number of RBS files, over 10000. I found that there is a big difference in execution time compared to when all the contents are in one file. I think this is an incentive to put the RBS files in one file instead of separating them, so the difference in execution time between the two should be smaller.

gen.rb

#! /usr/bin/env ruby

require 'pathname'

split = Pathname.new('split')
split.rmtree rescue nil
split.mkdir
n = ARGV[0].to_i
(n - 1).times.map { |i| "C#{i}" }.each do |c|
  split.join("#{c}.rbs").write(<<~RBS)
    class #{c}s
    end
  RBS
end

How to generate RBS files.

$ ruby gen.rb 2000

How to concat RBS files.

$ cat split/* > concat/concat.rbs

How to measure time.

$ time bundle exec steep check

Measurements were taken by switching directories.

$ cat Steepfile
target :sample do
  check "sample.rb"
  signature "split" # or "concat"
end

8 cpu execution time.

file count	split user	split real	concat user	concat real
2000	38.92	10.774	17.67	5.04
4000	75.43	22.585	22.19	6.612
6000	130.57	43.663	25.55	8.019
8000	204.71	72.76	31.27	9.888
10000	299.07	114.17	35.15	11.57

Oct 17 '22 13:10 ksss

Nice found...

The reason would be because Steep distributes the RBS validation tasks to worker processes by file. If we have 10,000 files and 10 workers, each worker has 100 validation tasks to execute. If we have 10 files and 10 workers, each worker has 1 validation task to execute. I guess this causes the difference, but not confirmed yet.

Nov 21 '22 07:11 soutaro

Looking at the $ htop, it appeared that there were times when tasks were concentrated on only one cpu. I have not been able to find out which task it is...

Nov 22 '22 07:11 ksss