micro Issues opening large files

Problem

I can't open a file that is 32 MB. I try opening it with micro, and I get a blank screen. With Nano it takes around 10 seconds to open the file. Memory consumption of Nano is pretty low when I open it as well, around 70MB, while micro is at 1.69 GB without ever opening it.

Specifications

Commit hash: 5dc8fe4 OS: Mac Os Sierra 10.12.4 Terminal: iTerm2 3.0.15

Apr 26 '17 19:04 arashsa

Thanks for reporting this. I'd like to try to reproduce it. Is there any way you could provide the file or a similar test case? Also, I recommend you try using a nightly binary and see if that works better for you. A number of optimizations have been made since the last release, but there are still a few problems and I haven't had any time to fix them lately.

Apr 26 '17 19:04 zyedidia

You can find the files I tried to open in this repo: https://github.com/emanlapponi/norlem-norwegian-lemmatizer

under the folder models.

I could not open nno.norlem.json (I made a mistake in my original comment, the file is actually 38MB).

I also tried opening the file using Vi, and it also takes around 10 seconds to open, approximately the same time as Nano.

Apr 26 '17 21:04 arashsa

I installed latest binaries and still same issue with opening large files.

Huge load on cpu and memory, and file note opening.

Version: 1.1.5-147 Commit hash: b8debb5 Compiled on April 26, 2017

Apr 27 '17 00:04 arashsa

The problem has to do mostly with the syntax highlighting. Currently there is no column where it cuts off so the highlighter is trying to highlight the entire file at once.

If you turn highlighting off and have softwrap off (micro -softwrap off -syntax off ...) then the file loads much faster. There are still large issues with softwrap though and even without softwrap on, micro is still slower than it should be.

I'll try to optimize some of this stuff at some point.

Apr 27 '17 01:04 zyedidia

@zyedidia It occurs to me, although I have no idea how to implement this, that syntax highlighting should only apply for lines as they come into the view, and for about one buffer length "behind" and "ahead" of the lines in view. That is, scarcely more than 300 non-softwrapped lines at a time. In addition, where possible, I think that micro should effectively act as a bidirectional pager, similar to less and similar tools. Most importantly, I think that buffers should only be kept in RAM for a few minutes and then cached when they haven't been updated recently, this could be worked into an improved autosave system. Caching could be done to the config directory or possibly to some sort of temporary or swap directory specified by micro's config. I have some other ideas too and I feel another refactor for the view system should come into play soon as soft-wrapping is fairly non-functional at present. All of this is factoring into this issue, which I've been having as well.

Apr 28 '17 14:04 GeigerCounter

Micro's syntax highlighting already only highlights the lines in the view. Every line has an end state that it keeps track of so that micro can know if the current view is currently in a multiline comment even when the comment started 1,000,000 lines earlier in the buffer, but the actually highlight colors are only stored for the lines currently in the buffer.

Clearly micro has performance issues when it tries to open a 32MB file which only contains one line, especially when asked to softwrap the line, but I don't think this is very common behavior. Micro is primarily meant for editing normal files. Softwrapping also has problems currently but only for lines that are so long that they take more than the height of the view to display, and the only issue there is that it is not possible to view the rest of the line (which is a problem, but the editor isn't crashing or anything).

I don't completely understand the caching idea. The buffer must be in memory at least during drawing to display the view.

Apr 29 '17 18:04 zyedidia

@zyedidia I have issues softwrapping much shorter lines, including ones that only wrap 4 times: That is they occupy four visual columns. I use micro for a lot of html editing and especially writing wherein I need to be able to navigate more as a conventional GUI text editor. That's why I use micro over e3 ( minimalist emacs ), and vim. ( Vim being more powerful for heavy-duty codesmithing the kind of which I barely need to do. ) Its mouse support, informing me of which and how many buffers I have open and its better syntax highlighting are why I prefer it to nano, which would otherwise so far be better suited to the type of text file I most often edit. So the performance hit and instability ( crashes and out-of-bounds errors noticeably more often and I think syntax highlighting doesn't like to play well with softwrapping is a big part of what causes the stuttering and occasional hang ) is a big deal even for normal files. As per keeping track of the end state to such a degree, I'd say that tracking a million line comment is enough of a non-standard case and performance intensive enough that we shouldn't do so, rather keeping it down to some sort of feasible upper limit that helps prevent hangs in cases like this.

Okay, well I think I see what you mean, but I think I misrepresented what I'm talking about. For reasons that are somewhat idiosyncratic and somewhat due to optimal workflow, I frequently open 80 tabs worth of buffers ( not actually that uncommon a case, run micro * on a directory full of config files or html documents ), and the performance noticeably suffers. What I'm saying is that if after loading into memory, a tab hasn't been switched to within the past arbitrary time ( let's say five minutes ), put it in a FIFO queue to write it off to a cache file that's essentially reopened when one switches to that tab. With each call to the save or autosave being written to the original file if it can still be found and asking the user what to do otherwise. Sure, disk usage will increase, but RAM usage will decrease and hopefully it'll alleviate the performance issues I've experienced when editing multiple files at once. Naturally, some larger files could even theoretically be split into multiple cached files for even better performance. ( Say we're editing the mythic 4 MB file, have every x lines - Let's say 100,000 because why not? - be written to a different cache file, a la cache._filepath_.0, cache._filepath_.1, etc. so that only so much of the file ( the line range you're currently editing ) actually needs to be in memory at once while still being able to piece together the cache files to write out the original file plus with your edits. ) I wouldn't suggest that if mere RAM usage were the only performance hit we're seeing, but I actually find micro to run more sluggish and handle things like softwrapping, highlighting, and navigation within a single file poorer when there are over ~20 files open. Granted, each file in my case is only a few dozen to a few hundred files each, so the issue might be something to do with how tabs are handled, but I'd think this would help with bugs like this one. I can't wrap my head around micro's data structure enough to say for sure until we try it.

May 02 '17 10:05 GeigerCounter

I'm experiencing the same issue. I just started micro and had it open an empty bash script and pasted a 4 KiB command that didn't contain a single line. (Yes, such things exist – in my case I wanted to examine the command IntelliJ IDEA is executing to start some Java application, which, to my dismay, contained hundreds of class paths and jar files.) Afterwards, micro became impossibly sluggish.

I also tried the same the thing with a plain text file – in this case, micro worked fine and as fast as always.

Sep 27 '17 09:09 codethief

I also wanted to open a big .txt file with some words so I guess syntax highlighting was not an issue but since it has about 200MB micro just consumed memory (about 8GB until I killed it). NVIM opened the file just fine (250MB) and scrolled flawlessly, sublime as well (1,5GB but smooth as...very smooth). Nano 1,1GB and fans kicking in but smooth as well.

Jan 02 '18 22:01 ghost

I have just pushed a big improvement to the memory used by the syntax highlighter. In my experience, opening a 200MB xml file where every line has a similar length takes about 4-5 seconds and the memory used is now 470MB (down from ~880MB before the most recent commit). This is with syntax highlighting. The performance without syntax highlighting is unchanged and very similar to the performance with syntax highlighting now.

I can test with a different file if you are willing to provide or describe the one you are using. It could be something different causing these differences in performance regarding line lengths, unicode etc...

Jan 29 '18 20:01 zyedidia

so I tried the nightly with this file and vim is super fast, sublime takes some time but is super fast after that and micro is somewhere in the middle, takes some time to load and is not super fast to scroll.

or use this file e.g. with C Syntax highlighting, vim is instant and has low memory (about 150 vs 450 MB) while Nightly Micro takes much more time and memory.

Jan 29 '18 22:01 ghost

@semirke I think it is place where the information could be shared

Jan 10 '24 09:01 dustdfg

micro micro copied to clipboard

Issues opening large files

Problem

Specifications

micro
micro copied to clipboard