lpython icon indicating copy to clipboard operation
lpython copied to clipboard

Python file read twice ?

Open rebcabin opened this issue 1 year ago • 8 comments

repro:

lldb lpython -- --show-asr tests/expr8.py

first time it's read is line 128 of lpython.cpp

    frame #0: 0x000000010000e794 lpython`(anonymous namespace)::emit_asr(infile="tests/expr8.py", pass_manager=0x000000016fdfe7d8, runtime_library_dir="/Users/brian/Dropbox/Mac/Documents/GitHub/lpython/src/bin/../runtime", with_intrinsic_modules=false, compiler_options=0x000000016fdfe870) at lpython.cpp:200:5
   197 	        std::string input = LCompilers::read_file(infile);
   198 	        lm.init_simple(input);
   199 	        lm.file_ends.push_back(input.size());
-> 200 	    }
   201 	    LCompilers::Result<LCompilers::LPython::AST::ast_t*> r1 = parse_python_file(
   202 	        al, runtime_library_dir, infile, diagnostics, 0, compiler_options.new_parser);
   203 	    std::cerr << diagnostics.render(lm, compiler_options);

second time it's read is line 125 of parser.cpp

    frame #0: 0x00000001000c9f78 lpython`LCompilers::LPython::parse_python_file(al=0x000000016fdfd2b0, (null)="/Users/brian/Dropbox/Mac/Documents/GitHub/lpython/src/bin/../runtime", infile="tests/expr8.py", diagnostics=0x000000016fdfd298, prev_loc=0, new_parser=true) at parser.cpp:126:49
   123 	    new_parser = true;
   124 	    LCOMPILERS_ASSERT(new_parser)
   125 	    std::string input = read_file(infile);
-> 126 	    Result<LPython::AST::Module_t*> res = parse(al, input, prev_loc, diagnostics);
   127 	    if (res.ok) {
   128 	        ast = (LPython::AST::ast_t*)res.result;
   129 	    } else {

intentional?

rebcabin avatar Feb 01 '24 19:02 rebcabin

I also see a read_file on line 197 of lpython.cpp. I'm still learning what's going on.

rebcabin avatar Feb 01 '24 20:02 rebcabin

The file should only be read once. If it is read twice, that's a bug that we need to fix. There are many file reads, but only one should be executed.

certik avatar Feb 01 '24 22:02 certik

Looks like, during parsing, read_file is called at least twice, once to parse and once to [gs]et_module_path, and maybe more than once for that purpose. Some of the following breaks are redundant, but not all of them are.

I set breakpoints on every one of these lines:

└─(11:13:31 on main ✹ ✭)──> find . "(" -iname "*.cpp" -o -name "*.h" ")" -print0 | xargs -0 -P 0 grep -n read_file                       ──(Sun,Feb04)─┘
./src/lpython/semantics/python_ast_to_asr.cpp:74:    bool status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:81:            status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:90:            status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:129:        std::string input = read_file(infile);
./src/lpython/parser/parser.cpp:125:    std::string input = read_file(infile);
./src/bin/lpython.cpp:90:    std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:103:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:140:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:161:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:172:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:197:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:250:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:295:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:346:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:399:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:443:    std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:491:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:592:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:707:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:762:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:872:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:945:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:1019:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:1296://     std::string input = read_file(infile);
./src/libasr/utils.h:105:bool read_file(const std::string &filename, std::string &text);
./src/libasr/string_utils.h:29:std::string read_file(const std::string &filename);
./src/libasr/utils2.cpp:31:bool read_file(const std::string &filename, std::string &text)
./src/libasr/diagnostics.cpp:126:    read_file(s.filename, input);
./src/libasr/codegen/asr_to_llvm.cpp:368:        std::string input = read_file(infile);
./src/libasr/string_utils.cpp:106:std::string read_file(const std::string &filename)
./src/libasr/asr_utils.cpp:455:        if (read_file(full_path.string(), modfile)) {

then I ran lpython --show-asr ./tests/expr8.py. The debugger stopped several times on read_file.

image

image

image

image

image

image

image

image

image

image

image

rebcabin avatar Feb 04 '24 19:02 rebcabin

If so, we need to fix it.

certik avatar Feb 05 '24 05:02 certik

marking enhancement and good first issue. It likely means some architectural "re-plumbing" of the front-end. Some file metadata or even the whole file itself will have to be cached somewhere.

rebcabin avatar Feb 07 '24 15:02 rebcabin

I write a gdb script to have a dig into the problem. It will record the call stack when read_file is called and print them into log file (gdb.txt by default). You can use gdb -x ./read_file_count.gdb to use this script:

# read_file_count.gdb
set logging enabled on
set pagination off

file ./src/bin/lpython
set args --show-asr ./tests/expr8.py
set $count=0
break LCompilers::read_file
commands
  silent
  set $count = $count + 1
  printf "Called function: %s\n", "your_function_name"
  printf "Here is the function input args: "
  info args
  printf "function call location: "
  where 1
  list
  cont
  end
r
set logging enabled off

In logging file, The read_file is indeed called two times:

Called function: your_function_name
Here is the function input args: filename = "./tests/expr8.py"
function call location: #0  LCompilers::read_file (filename="./tests/expr8.py") at /home/zzh/lpython/src/libasr/string_utils.cpp:109
104     }
105
106     std::string read_file(const std::string &filename)
107     {
108         std::ifstream ifs(filename.c_str(), std::ios::in | std::ios::binary
109                 | std::ios::ate);
110
111         std::ifstream::pos_type filesize = ifs.tellg();
112         if (filesize < 0) return std::string();
113
Called function: your_function_name
Here is the function input args: filename = "./tests/expr8.py"
function call location: #0  LCompilers::read_file (filename="./tests/expr8.py") at /home/zzh/lpython/src/libasr/string_utils.cpp:109
104     }
105
106     std::string read_file(const std::string &filename)
107     {
108         std::ifstream ifs(filename.c_str(), std::ios::in | std::ios::binary
109                 | std::ios::ate);
110
111         std::ifstream::pos_type filesize = ifs.tellg();
112         if (filesize < 0) return std::string();
113

Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?

syheliel avatar Feb 17 '24 04:02 syheliel

Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?

Let's refactor the frontend to only call the read function once. I wonder what is going on.

certik avatar Feb 17 '24 05:02 certik

Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?

Let's refactor the frontend to only call the read function once. I wonder what is going on.

Ok, let me explore the frontend's execution flow and try to figure out the root cause

syheliel avatar Feb 17 '24 06:02 syheliel