[BUG] debug locations sometime wrong, affecting sourcemaps
While modifying parse_bytecode.ml for rehp, I might have stumbled upon some evidence that parse_bytecode.ml is responsible for source maps being
difficult to use in practice (they jump all over the place in very unintuitive ways).
Problem: For the following Reason code:
let greeting = "hello world";
print_endline(greeting);
let greeting = "hello world with unicode: Ɋ";
print_endline(greeting); /* THIS IS LINE 4 */
let unicodeLength = String.length(String.make(Random.int(30), 'i'));
print_endline(
"String.length(\"Ɋ\") should be two:" ++ string_of_int(unicodeLength),
);
print_endline(String.make(1, "Ɋ".[0]) ++ String.make(1, "Ɋ".[1]));
We get the following debug data: (Notice the many things claiming to be on line 4 that aren't!
For example _g_ (the number 30))
/*<<rehack_tests/strings/strings.re 2 0>>*/ /*<<rehack_tests/strings/strings.re 2 0>>*/ print_endline(
greeting
);
/*<<rehack_tests/strings/strings.re 4 0>>*/ /*<<rehack_tests/strings/strings.re 4 0>>*/ print_endline(
greeting__0
);
/*<<rehack_tests/strings/strings.re 4 0>>*/ var /*<<rehack_tests/strings/strings.re 4 0>>*/ _f_ =
105,
/*<<rehack_tests/strings/strings.re 4 0>>*/ _g_ = 30,
/*<<rehack_tests/strings/strings.re 6 46>>*/ _h_ =
/*<<rehack_tests/strings/strings.re 6 46>>*/ int__1(_g_),
/*<<rehack_tests/strings/strings.re 6 34>>*/ unicodeLength =
/*<<rehack_tests/strings/strings.re 6 34>>*/ caml_ml_string_length(
/*<<rehack_tests/strings/strings.re 6 34>>*/ make__0(_h_, _f_)
),
/*<<rehack_tests/strings/strings.re 8 44>>*/ _i_ =
/*<<rehack_tests/strings/strings.re 8 44>>*/ string_of_int(unicodeLength),
/*<<rehack_tests/strings/strings.re 8 2>>*/ _k_ =
/*<<rehack_tests/strings/strings.re 8 2>>*/ _a_(_j_, _i_);
/*<<rehack_tests/strings/strings.re 7 0>>*/ /*<<rehack_tests/strings/strings.re 7 0>>*/ print_endline(
_k_
);
/*<<rehack_tests/strings/strings.re 4 0>>*/ var /*<<rehack_tests/strings/strings.re 10 57>>*/ _l_ =
138,
/*<<rehack_tests/strings/strings.re 10 57>>*/ _m_ = 1,
/*<<rehack_tests/strings/strings.re 10 42>>*/ _n_ =
/*<<rehack_tests/strings/strings.re 10 42>>*/ make__0(_m_, _l_),
/*<<rehack_tests/strings/strings.re 10 29>>*/ _o_ = 201,
/*<<rehack_tests/strings/strings.re 10 29>>*/ _p_ = 1,
/*<<rehack_tests/strings/strings.re 10 14>>*/ _q_ =
/*<<rehack_tests/strings/strings.re 10 14>>*/ make__0(_p_, _o_),
/*<<rehack_tests/strings/strings.re 10 14>>*/ _r_ =
/*<<rehack_tests/strings/strings.re 10 14>>*/ _a_(_q_, _n_);
/*<<rehack_tests/strings/strings.re 10 0>>*/ /*<<rehack_tests/strings/strings.re 10 0>>*/ print_endline(
_r_
);
But if you add the following print statements right after this line in parse_bytecode.ml
You will see that there are only two debug events for line 4, (likely the correct line print_endline(greeting))
let _ = print_string ((if after then "<after>" else "") ^ (if before then "<before>" else "") ^ " DEBUG EVENT FOUND AT: " ^ (pos.Lexing.pos_fname) ^ ":" ^ string_of_int(pos.Lexing.pos_lnum) ^ ":" ^ string_of_int (pos.Lexing.pos_cnum - pos.Lexing.pos_bol)) in
let _ = print_string (" " ^ (if after then "<after>" else "") ^ (if before then "<before>" else "") ^ " DEBUG END FOUND AT: " ^ (loc.Location.loc_end.Lexing.pos_fname) ^ ":" ^ string_of_int(loc.Location.loc_end.Lexing.pos_lnum) ^ ":" ^ string_of_int (loc.Location.loc_end.Lexing.pos_cnum - loc.Location.loc_end.Lexing.pos_bol)) in
let _ = print_newline () in
This site is really good for debugging issues visually: http://sokra.github.io/source-map-visualization/#custom
Also, OCaml locations are bytes based where sourcemap seems to expect code-point based.
https://github.com/ocaml/ocaml/issues/11899
https://github.com/terser/terser/issues/960