GreenPad
GreenPad copied to clipboard
Unresponsive when loading huge file
I investigated this bug for a very long time (yes, I'm still using GreenPad as a daily driver) but I can't find which line cause this issue.
Happy to know that you are using this text editor. When you say huge How much are-you taking about? I can load 20Mo .c files in a few seconds on my old laptop which is not bad. For now GreenPad loads the whold file to memory before display. this is not the best approach, Ideally it would use pagination and only load the needed blocks of 4k and display them. Of course it is not trivial to implement. When you say unresponsive, do you mean it never freezes for ever or that it temporarly gets unresponsive? In thislater case it should be possible to execute the message loop while loading the file to avoid freezing the UI, It should also be possible to display adequate message in the statusbar.
I can load 20Mo .c files in a few seconds on my old laptop which is not bad.
maybe in hundreds of text.
When you say unresponsive, do you mean it never freezes for ever or that it temporarly gets unresponsive?
it looks like forever, memory goes up to several hundred and keep busying
I will make some tests with few hundreds megs files.
True, actually even with a 22MB file GreenPad eats up 230+MB of ram, so it needs ~10 times more mem than the file size, I then tried with a 114Mo file and it still works but I have to wait for 20 sec before Greenpad's windows shows up, then another 5 minutes before text shows up (because of coloration), without coloration it just takes the 20s. In 32 bits we are limited to 2GB of ram, so I guess it is not possible to have a file larger than 200MB or so.
If I load a 230 MB txt file, then select all and paste, I get a segfault in memmove because the destination address went beyond the the maximum possible address 0xFFFFFFFF. I can use the /LARGEADDRESSAWARE switch to benefit for more Ram when on 64bit system but that is all unless I would make a 64b build, which I should (I should get newer visual-studio). Also the /3GB switch could help on my Win server 2003 PAE able ~~machine, but this switch is not available on VS6~~. Ignore this I mixed up /3GB is a Windows option..
The best thing to do would be memory usage optimization, I must investigate this.
Well for me it is seems to work fine, if the file is small enough to hold into memory (less than 280MB or so), then it loads after usually less than 1 minute and it works, Or it is to big ~280MB+ and HeapAlloc fails inducing a segfault. I tested with a simple ASCII file with the .txt extension (no coloration). Once the file is loaded if there is no coloration and no wrapping, then I find it quite usable and snappy, Of course saving takes a few seconds. I made small changes regarding memory allocation, to make clear error messages when out of memory and exit if applicable. NOTE: I have 3GB of RAMs o there is no swapping occurring for me. I guess it would be extremely slow with only 1-2GB of RAM
There is a problem when you have long lines in a file, because the data get moved a thousand time in the line buffer as it gets filled by 1KB blocks via the InsertAt() function.
The simple solution is to use a larger buffer in the DocImpl::OpenFile( ) function.
Using 128KB buffer mostly fixes it, it still is a bit slow to load lines longer than 1MB.
If you have a lot of files with few MB lines, then use a buffer in MB too (default in 64bit mode). Otherwise with the default 1KB buffer it takes forever to load.
Partial fix: #38
The reading routine should be changed not to be in O(n^2) with the length of a line. Ideally a line should be split beyond a certain point in smaller buffer. The length of each small buffer should be also so that calculating the ling length does not take huge time. For now I guess handling of up to 1MB lines is not so bad on my [email protected].
maybe we still need a way to tell users "we're busying on (re)opening file", for example setting cursor to hourglass for actions that can take a lot of time(for example, re-opening, replace-all, etc.)
I already display Loading file... in the statusbar but the hourglass cursor is a must indeed.
@roytam1
I made a huge optimization (very simple) when it comes to loading times. #164
The idea is to add a reparse flag to the Document::InsertingOperation() function so that reparsing can be done separately at the end of file loading for all lines at once.
With this even 64MB lines become become manageable.
When handling long lines be sure to set Column by: Letters instead of Position in the Settings. This makes a huge difference when line is larger than 100KB. The culprit is GreenPadWnd::on_move().
I tried to port it to my tree:
diff --git a/GreenPad/editwing/ip_doc.h b/GreenPad/editwing/ip_doc.h
index 861f19f..399f192 100644
--- a/GreenPad/editwing/ip_doc.h
+++ b/GreenPad/editwing/ip_doc.h
@@ -418,7 +418,7 @@ private:
// 挿入・削除作業
bool InsertingOperation(
- DPos& stt, const unicode* str, ulong len, DPos& undoend );
+ DPos& stt, const unicode* str, ulong len, DPos& undoend, bool reparse=true );
bool DeletingOperation(
DPos& stt, DPos& end, unicode*& undobuf, ulong& undosiz );
diff --git a/GreenPad/editwing/ip_text.cpp b/GreenPad/editwing/ip_text.cpp
index 337e890..c9e512c 100644
--- a/GreenPad/editwing/ip_text.cpp
+++ b/GreenPad/editwing/ip_text.cpp
@@ -490,7 +490,7 @@ bool DocImpl::DeletingOperation
}
bool DocImpl::InsertingOperation
- ( DPos& s, const unicode* str, ulong len, DPos& e )
+ ( DPos& s, const unicode* str, ulong len, DPos& e, bool reparse )
{
AutoLock lk( this );
@@ -535,7 +535,7 @@ bool DocImpl::InsertingOperation
}
// 再解析
- return ReParse( s.tl, e.tl );
+ return reparse && ReParse( s.tl, e.tl );
}
@@ -731,20 +731,16 @@ void DocImpl::OpenFile( aptr<TextFileR> tf )
buf_sz = SBUF_SZ;
}
- for( ulong i=0; tf->state(); )
+ size_t L;
+ ulong i=0;
+ for( i=0; L = tf->ReadBuf( buf, buf_sz ); )
{
- if( size_t L = tf->ReadBuf( buf, buf_sz ) )
- {
- DPos p(i,0xffffffff);
- InsertingOperation( p, buf, (ulong)L, e );
- i = tln() - 1;
- }
- if( tf->state() == 1 )
- {
- DPos p(i++,0xffffffff);
- InsertingOperation( p, L"\n", 1, e );
- }
+ DPos p( i, len(e.tl) ); // end of document
+ InsertingOperation( p, buf, (ulong)L, e, /*reparse=*/false );
+ i = tln() - 1;
}
+ // Parse All lines, because we skipped it
+ ReParse( 0, tln()-1 );
if( buf != sbuf )
delete [] buf;
but I found that it is almost no difference when I tried to open some hard disk image files, even I set Column by = Letters in settings.
the difference should kick in when you have a huge line, try a 100MB line for instance. should be just a few seconds to load with the patch. binary files typically have plenty of line breaks by chance. in which case it should make no difference.
When I open .json files I often have the problem, often they are mono-line in the MB range...
Just in case for testing: GreenPad1.20test.zip
I think also that the 64 bit build will be little affected because the intermediate buffer is 2MB which is crazy big and longer than most line one typically encounters.
I think also that the 64 bit build will be little affected because the intermediate buffer is 2MB
yeah, while I tried 32bit build here
I reverted a reallocate change here that helps with performances when allocating long lines: https://github.com/roytam1/rtoss/blob/e0b647f027c12acbfaf5a6d4e37bef980cea4824/GreenPad/editwing/ip_doc.h#L64C1-L67
alen_ = Max( alen_+(alen_>>1), len_+siz ); // 1.5xAlen
The original code used Max( alen_<<1, len_+siz ); // 2xAlen,
which is memory heavy so I use 1.5x which is a compromise.
Maybe that is the reason that skipping ReParse does not make big difference for you.