Python file read twice ?
repro:
lldb lpython -- --show-asr tests/expr8.py
first time it's read is line 128 of lpython.cpp
frame #0: 0x000000010000e794 lpython`(anonymous namespace)::emit_asr(infile="tests/expr8.py", pass_manager=0x000000016fdfe7d8, runtime_library_dir="/Users/brian/Dropbox/Mac/Documents/GitHub/lpython/src/bin/../runtime", with_intrinsic_modules=false, compiler_options=0x000000016fdfe870) at lpython.cpp:200:5
197 std::string input = LCompilers::read_file(infile);
198 lm.init_simple(input);
199 lm.file_ends.push_back(input.size());
-> 200 }
201 LCompilers::Result<LCompilers::LPython::AST::ast_t*> r1 = parse_python_file(
202 al, runtime_library_dir, infile, diagnostics, 0, compiler_options.new_parser);
203 std::cerr << diagnostics.render(lm, compiler_options);
second time it's read is line 125 of parser.cpp
frame #0: 0x00000001000c9f78 lpython`LCompilers::LPython::parse_python_file(al=0x000000016fdfd2b0, (null)="/Users/brian/Dropbox/Mac/Documents/GitHub/lpython/src/bin/../runtime", infile="tests/expr8.py", diagnostics=0x000000016fdfd298, prev_loc=0, new_parser=true) at parser.cpp:126:49
123 new_parser = true;
124 LCOMPILERS_ASSERT(new_parser)
125 std::string input = read_file(infile);
-> 126 Result<LPython::AST::Module_t*> res = parse(al, input, prev_loc, diagnostics);
127 if (res.ok) {
128 ast = (LPython::AST::ast_t*)res.result;
129 } else {
intentional?
I also see a read_file on line 197 of lpython.cpp. I'm still learning what's going on.
The file should only be read once. If it is read twice, that's a bug that we need to fix. There are many file reads, but only one should be executed.
Looks like, during parsing, read_file is called at least twice, once to parse and once to [gs]et_module_path, and maybe more than once for that purpose. Some of the following breaks are redundant, but not all of them are.
I set breakpoints on every one of these lines:
└─(11:13:31 on main ✹ ✭)──> find . "(" -iname "*.cpp" -o -name "*.h" ")" -print0 | xargs -0 -P 0 grep -n read_file ──(Sun,Feb04)─┘
./src/lpython/semantics/python_ast_to_asr.cpp:74: bool status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:81: status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:90: status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:129: std::string input = read_file(infile);
./src/lpython/parser/parser.cpp:125: std::string input = read_file(infile);
./src/bin/lpython.cpp:90: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:103: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:140: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:161: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:172: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:197: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:250: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:295: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:346: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:399: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:443: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:491: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:592: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:707: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:762: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:872: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:945: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:1019: std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:1296:// std::string input = read_file(infile);
./src/libasr/utils.h:105:bool read_file(const std::string &filename, std::string &text);
./src/libasr/string_utils.h:29:std::string read_file(const std::string &filename);
./src/libasr/utils2.cpp:31:bool read_file(const std::string &filename, std::string &text)
./src/libasr/diagnostics.cpp:126: read_file(s.filename, input);
./src/libasr/codegen/asr_to_llvm.cpp:368: std::string input = read_file(infile);
./src/libasr/string_utils.cpp:106:std::string read_file(const std::string &filename)
./src/libasr/asr_utils.cpp:455: if (read_file(full_path.string(), modfile)) {
then I ran lpython --show-asr ./tests/expr8.py. The debugger stopped several times on read_file.
If so, we need to fix it.
marking enhancement and good first issue. It likely means some architectural "re-plumbing" of the front-end. Some file metadata or even the whole file itself will have to be cached somewhere.
I write a gdb script to have a dig into the problem. It will record the call stack when read_file is called and print them into log file (gdb.txt by default). You can use gdb -x ./read_file_count.gdb to use this script:
# read_file_count.gdb
set logging enabled on
set pagination off
file ./src/bin/lpython
set args --show-asr ./tests/expr8.py
set $count=0
break LCompilers::read_file
commands
silent
set $count = $count + 1
printf "Called function: %s\n", "your_function_name"
printf "Here is the function input args: "
info args
printf "function call location: "
where 1
list
cont
end
r
set logging enabled off
In logging file, The read_file is indeed called two times:
Called function: your_function_name
Here is the function input args: filename = "./tests/expr8.py"
function call location: #0 LCompilers::read_file (filename="./tests/expr8.py") at /home/zzh/lpython/src/libasr/string_utils.cpp:109
104 }
105
106 std::string read_file(const std::string &filename)
107 {
108 std::ifstream ifs(filename.c_str(), std::ios::in | std::ios::binary
109 | std::ios::ate);
110
111 std::ifstream::pos_type filesize = ifs.tellg();
112 if (filesize < 0) return std::string();
113
Called function: your_function_name
Here is the function input args: filename = "./tests/expr8.py"
function call location: #0 LCompilers::read_file (filename="./tests/expr8.py") at /home/zzh/lpython/src/libasr/string_utils.cpp:109
104 }
105
106 std::string read_file(const std::string &filename)
107 {
108 std::ifstream ifs(filename.c_str(), std::ios::in | std::ios::binary
109 | std::ios::ate);
110
111 std::ifstream::pos_type filesize = ifs.tellg();
112 if (filesize < 0) return std::string();
113
Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?
Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?
Let's refactor the frontend to only call the read function once. I wonder what is going on.
Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?
Let's refactor the frontend to only call the read function once. I wonder what is going on.
Ok, let me explore the frontend's execution flow and try to figure out the root cause