dyon icon indicating copy to clipboard operation
dyon copied to clipboard

Solving parser bottleneck

Open bvssvni opened this issue 7 years ago • 6 comments

Did some profiling with cmr's hprof library on a 582 loc file:

Timing information for root profiler:
  run - 1 * 1.1s = 1.1s @ 0.9hz
    lazy - 1 * 114ns = 114ns (0.0%)
    parse - 1 * 1.1s = 1.1s (98.3%)
    prelude - 1 * 577.6us = 577.6us (0.1%)
    check - 1 * 38.0us = 38.0us (0.0%)
    convert - 1 * 5.9ms = 5.9ms (0.5%)
    join - 1 * 1.6ms = 1.6ms (0.1%)
    ignored - 1 * 94ns = 94ns (0.0%)

Over 98% of the time is spent parsing.

lib.rs:

let parse_guard = enter("parse");
    let mut data = vec![];
    try!(parse_errstr(syntax_rules, &d, &mut data).map_err(
        |err| format!("In `{}:`\n{}", source, err)
    ));
    drop(parse_guard);

Could figure out what's taking so much time in Piston-Meta.

Develop binary format for Piston-Meta

The meta data is all you need, while the syntax is just to make it readable. Sometimes you reuse the same data over and over, but want to load it as fast as possible.

Could develop an efficient binary format for meta data that loads much quicker. This format can also save space by using a string table. Meta data conversion could happen directly from binary data (changes required in Piston-Meta).

The benefit with this direction is that all meta formats automatically gets a binary format.

Add a bin type to Dyon

Also been thinking about compression and privacy (simple encryption, for example for game content). Then it hit me that perhaps Dyon could get a built-in bin type. Could also use it as a bit vector.

bvssvni avatar Jan 12 '17 12:01 bvssvni

Perhaps a better idea: Generate meta data in Rust code.

If you don't need the original source, you might as well include the meta data in the binary executable. This will skip the parsing step completely, while being compatible with all meta data converters in Rust.

bvssvni avatar Jan 15 '17 04:01 bvssvni

Added new function for loading directly from meta data in https://github.com/PistonDevelopers/dyon/pull/432. Included an example that generates meta data in Rust code. It works!

Upside is that the binary becomes self contained, with no external code. Loads very fast, because there is no parsing. Skips lifetime and type checks by assuming the source already has been checked. This technique could be used when packaging apps.

Downside is that build times in release mode are scary: 1167.39 seconds for the n-body example. This should be straight forward code for the Rust compiler. I am wondering whether LLVM does some partial evaluation since it knows the input and this could be the reason it takes so long time.

$ time ./target/debug/examples/test
-0.16907514416856514
-0.16908671301860925

real	0m1.378s
user	0m1.368s
sys	0m0.009s

$ time ./target/release/examples/test
-0.16907514416856514
-0.16908671301860925

real	0m0.077s
user	0m0.069s
sys	0m0.006s

$ time ./target/debug/examples/test_compile
-0.16907514416856514
-0.16908671301860925

real	0m0.888s
user	0m0.827s
sys	0m0.014s

$ time ./target/release/examples/test_compile
-0.16907514416856514
-0.16908671301860925

real	0m0.051s
user	0m0.041s
sys	0m0.006s

bvssvni avatar Jan 15 '17 14:01 bvssvni

Try -Z time-passes flag for rustc to see where time is spent (e.g. export RUSTFLAGS="-Z time-passes"). Feel free to also open an issue on the rustc. I assume that by “meta data” you mean some generated rust code; it might be hitting some super-linear code path somewhere and ideally none would exist; getting rid of it is difficult sometimes.

nagisa avatar Jan 15 '17 17:01 nagisa

@nagisa Thanks! Running it now.

bvssvni avatar Jan 15 '17 18:01 bvssvni

Here is what I got:

$ cargo run --release --example test_compile
   Compiling dyon v0.16.0 (file:///Users/sven/rust/dyon)
time: 0.039	parsing
time: 0.000	recursion limit
time: 0.000	crate injection
time: 0.000	plugin loading
time: 0.000	plugin registration
time: 0.308	expansion
time: 0.000	maybe building test harness
time: 0.001	maybe creating a macro crate
time: 0.000	checking for inline asm in case the target doesn't support it
time: 0.001	complete gated feature checking
time: 0.005	early lint checks
time: 0.001	AST validation
time: 0.013	name resolution
time: 0.007	lowering ast -> hir
time: 0.001	indexing hir
time: 0.001	attribute checking
time: 0.000	language item collection
time: 0.001	lifetime resolution
time: 0.000	looking for entry point
time: 0.000	looking for plugin registrar
time: 0.003	region resolution
time: 0.001	loop checking
time: 0.000	static item recursion checking
time: 0.011	compute_incremental_hashes_map
time: 0.000	load_dep_graph
time: 0.001	stability index
time: 0.002	stability checking
time: 0.001	type collecting
time: 0.000	variance inference
time: 0.000	impl wf inference
time: 0.008	coherence checking
time: 0.000	wf checking
time: 0.001	item-types checking
time: 0.871	item-bodies checking
time: 0.000	drop-impl checking
time: 0.023	const checking
time: 0.002	privacy checking
time: 0.001	intrinsic checking
time: 0.001	effect checking
time: 0.001	match checking
time: 0.003	liveness checking
time: 0.020	rvalue checking
time: 0.498	MIR dump
  time: 0.015	SimplifyCfg
  time: 0.018	QualifyAndPromoteConstants
  time: 0.020	TypeckMir
  time: 0.000	SimplifyBranches
  time: 0.008	SimplifyCfg
time: 0.062	MIR cleanup and validation
time: 0.121	borrow checking
time: 0.000	reachability checking
time: 0.001	death checking
time: 0.000	unused lib feature checking
time: 0.030	lint checking
time: 0.003	resolving dependency formats
  time: 0.000	NoLandingPads
  time: 0.010	SimplifyCfg
  time: 0.010	EraseRegions
  time: 0.003	AddCallGuards
  time: 0.994	ElaborateDrops
  time: 0.000	NoLandingPads
  time: 0.023	SimplifyCfg
  time: 0.005	InstCombine
  time: 0.000	Deaggregator
  time: 0.000	CopyPropagation
  time: 0.006	SimplifyLocals
  time: 0.003	AddCallGuards
  time: 0.000	PreTrans
time: 1.054	MIR optimisations
  time: 0.000	write metadata
  time: 0.208	translation item collection
  time: 0.002	codegen unit partitioning
  time: 0.002	internalize symbols
time: 0.606	translation
time: 0.000	assert dep graph
time: 0.000	serialize dep graph
  time: 34.105	llvm function passes [0]
  time: 1215.066	llvm module passes [0]
  time: 6.071	codegen passes [0]
  time: 0.001	codegen passes [0]
time: 1255.267	LLVM passes
time: 0.000	serialize work products
  time: 0.432	running linker
time: 0.439	linking

The "llvm module passes" part takes most of the time.

One intriguing idea I have: Knowing the input could trigger some optimization in LLVM, where it attempts to evaluate things, thereby essentially doing what the Dyon run-time is supposed to be doing: Executing the program, but at a much slower speed.

bvssvni avatar Jan 15 '17 19:01 bvssvni

Opened https://github.com/rust-lang/rust/issues/39082 with updated to latest nightly.

bvssvni avatar Jan 15 '17 20:01 bvssvni