RecuperaBit icon indicating copy to clipboard operation
RecuperaBit copied to clipboard

MemoryError and no partitions found

Open Frank071 opened this issue 6 years ago • 16 comments

RecuperaBit consistently drops out of warp with a memory error:

INFO:root:Found NTFS boot sector at sector 1317656575
INFO:root:First scan completed
INFO:root:Parsing MFT entries
Traceback (most recent call last):
  File "main.py", line 385, in <module>
    main()
  File "main.py", line 367, in main
    parts.update(scanner.get_partitions())
  File "/root/RecuperaBit-master/recuperabit/fs/ntfs.py", line 698, in get_partitions
    parsed = parse_file_record(dump)
  File "/root/RecuperaBit-master/recuperabit/fs/ntfs.py", line 149, in parse_file_record
    attributes = _attributes_reader(entry, header['off_first'])
  File "/root/RecuperaBit-master/recuperabit/fs/ntfs.py", line 109, in _attributes_reader
    attr, name = parse_mft_attr(entry[offset:])
  File "/root/RecuperaBit-master/recuperabit/fs/ntfs.py", line 79, in parse_mft_attr
    nonresident = unpack(attr, attr_nonresident_fmt)
  File "/root/RecuperaBit-master/recuperabit/utils.py", line 98, in unpack
    result[label] = formatter(data[low:high+1])
MemoryError

Carefully following the process via top I see that in fact the memory in use rises up to 4G and then the process dies. It looks as if it is really the amount of files (or whatever) it finds that causes this.

I found that 'partitioned_files' grows beyond 100000 so around line 720 in ntfs.py I added some code to stop the process when it reached 50000. This "solves" the MemoryError, but now I end up with "0 partitions found" :-(

I already used photorec on this imagefile and that resulted in a lot of files, but obviously all named cryptic, so I hoped recuperabit would rush to the rescue... Any help is appreciated (and if not then perhaps this can help so the tool will exit nicely when it grows irratically).

Frank071 avatar Sep 28 '18 09:09 Frank071

I see that '0 partitions' is also part of #17 - so it might be unrelated to my shortcutting the scan process. Issue #17 ends with a request for a dump of a sector, but is still open. Would it help if I would dump some?

... INFO:root:Found NTFS file record at sector 6296932 INFO:root:Found NTFS file record at sector 6296934 INFO:root:Found NTFS file record at sector 6296936 INFO:root:Found NTFS file record at sector 6296938 INFO:root:Found NTFS file record at sector 6296940 INFO:root:Found NTFS file record at sector 6296942 ...

While at it: the save-file contains 3.7 million lines....

Frank071 avatar Sep 28 '18 10:09 Frank071

Can you provide detailed specifications about your environment, the Python version (and implementation), system architecture, OS, etcetera?

Also, how large is the disk image you are analyzing?

Do you have enough swap?

Lazza avatar Sep 28 '18 14:09 Lazza

The disk image is roughly 630GB. This is a Linux amd64 environment better known as sysresccd 4.14.32-std522 with python 3.6.3 (I am not sure what you mean with 'implementation'). The system has 24G available and that is not the limit (no OOM action or whatever), so swap is not used (the counter stays 0).

Frank071 avatar Sep 28 '18 16:09 Frank071

with python 3.6.3

It shouldn't even run on Python 3, given that it uses Python 2 syntax.

I am not sure what you mean with 'implementation'

I meant cPython vs Pypy.

Lazza avatar Sep 29 '18 12:09 Lazza

It shouldn't even run on Python 3, given that it uses Python 2 syntax.

Sorry... my bad... I run it with Python 2 (2.7.14)

I am not sure what you mean with 'implementation'

I meant cPython vs Pypy.

cPython

Frank071 avatar Sep 30 '18 21:09 Frank071

The fact that it stops at precisely 4GB looks a bit strange. Have you checked the Python executable is really 64bits? Could you try with Pypy?

Unfortunately some large disks or disks that were very fragmented currently require a lot of memory.

Lazza avatar Oct 02 '18 12:10 Lazza

Although 'uname -m' reports this system as being 64bit, all binaries - including python2 - are 32bit. So that is the 4G limit solved. As I am on systemrescuecd the possibilities are limited, 'pypy' for instance is not available. So I fear it stops here - although it would help if we could think of a mechanism that allows partial recovery.

Frank071 avatar Oct 03 '18 18:10 Frank071

As I am on systemrescuecd the possibilities are limited

You could opt for a different live distro (ensuring it's 64-bit) and try if it still crashes. Pypy can be usually installed from the repositories of several distributions.

Lazza avatar Oct 04 '18 18:10 Lazza

I had to physically get to the system, but it is now running a proper 64bit environment and I have pypy available. That solves the crashing bit and it gets a great deal further (I see MATCH lines), but it gets killed off eventually due to being too memory consuming (> 16GB). It hasn't recovered anything when killed :(

Frank071 avatar Oct 05 '18 21:10 Frank071

I wish I could say I have an easy solution for that, but currently... I don't. In the future RecuperaBit might (should ?) use a SQLite file to store thousands of information, artifacts, etc but that would require a rewrite that at this time I cannot promise due to lack of time.

What I can suggest as a workaround is one of the following:

  • Create some very large swap files and mount them (you can do as much as you want if you have space on a disk)
  • Edit RecuperaBit to prune partitions under a certain size (see this comment here)

Lazza avatar Oct 05 '18 22:10 Lazza

OK, adding swap did not help much (tried that before I read your comment) but the pruning bit did wonders. I now got through to the recovery state. Nice! Perhaps the pruning bit could be a command line option, or even some intelligent feature based on the number of partitions found?

Frank071 avatar Oct 07 '18 15:10 Frank071

The problem is that with pruning you are discarding valuable information and you might be recovering less files (there is no way to figure out if those partitions you discard are indeed useless).

The user interface might definitely benefit from some improvement but also hopefully the backend can be optimized a bit.

Lazza avatar Oct 09 '18 11:10 Lazza

~~Like, I don't want to be that guy.. But is it really normal that a super humble 230GB image takes 5GB of ram to process?~~ Ok, yes it seems.

mirh avatar Feb 03 '20 12:02 mirh

@mirh I understand your concern, currently most of the processing for the reconstruction is done in RAM. Ideally, RecuperaBit should leverage a SQLite DB for much better efficiency.

Lazza avatar Feb 06 '20 22:02 Lazza

Would be cool to see that drop by a factor of 3 to 5... Could also the process be multithreaded/parallelized somewhat? The competition seems to be "ready" as soon as it has ended the reading of the image. RB takes a very long time in addition instead.

mirh avatar Feb 07 '20 02:02 mirh

Regarding the process that figures out the partition boundaries, it could probably be parallelized partially. That is indeed quite an interesting suggestion.

I am really sad that in this period my amount of free time that I can dedicate to improving the tool is near zero. 😢

Lazza avatar Jun 12 '20 15:06 Lazza