filehash icon indicating copy to clipboard operation
filehash copied to clipboard

Unable to dbInsert a .txt file

Open pakom opened this issue 9 years ago • 0 comments

Dear Roger,

I am trying to dbInsert a large .txt file as a data frame using the read_fwf function from the readr package. The file comes from OECD's PISA 2012 and its size is 1.1GB. It contains the responses to the student questionnaire. I work on a laptop with 4GB of RAM under Arch Linux (64-bit) and have about 250GB of free space on the hard drive. The size of the swap partition is 2GB. Here is the code that I use:

setwd("/media/work")

dbCreate("tmpDB")

DB <- dbInit("tmpDB")

dbInsert(DB, "x", data.frame(read_fwf(
file = "/media/PISA_2012/INT_STU12_DEC03.txt",
fwf_positions(start = ranges.start, end = ranges.end,
col_names = var.names), progress = FALSE)))

ranges.start, ranges.end and var.names are taken from the .sps file provided with the .txt data file.

The tmpDB file is created, the DB is initialized in the R environment. The dbInsert runs without any error or warning messages, but after being done the file size of the tmpDB still remains 0B, the dbList(DB) returns character(0) and the key x does not seem to exist.

I tried with smaller files from the same or previous cycles and with those of about 500MB it works. I also tried taking just 200 lines from the file I have troubles with and it works too. I thought this might be due to the limitation of my /tmp folder which is the system's temporary folder and is limited to 1.8GB. Then I installed the unixtoolspackage and used the following to change R's temporary folder and check if it is changed:

> set.tempdir("/media/temp")
> tempdir()
[1] "/media/temp"
> tempfile()
[1] "/media/temp/file8fc7d43a8d6"

I run the dbInsert code above again. However, the result is the same - tmpDB is still 0B, the x key does not exist.

What would be the reason for this behavior?

Regards

pakom avatar Dec 22 '15 16:12 pakom