AzureSMR icon indicating copy to clipboard operation
AzureSMR copied to clipboard

Read .gz file from Data Lake

Open MartheUT opened this issue 6 years ago • 1 comments

There is a need to read .gz files from the data lake. Adding gunzip to the azureDataLakeRead function will not work because you can't unzip a response only a file.

MartheUT avatar May 08 '18 09:05 MartheUT

Probably not the most elegant solution, but it works:

azureDataLakeReadCSVGZ<- function (azureActiveContext, azureDataLakeAccount, relativePath, 
                                  offset, length, bufferSize, verbose = FALSE) 
{
  resHttp <- azureDataLakeReadCore(azureActiveContext, azureDataLakeAccount, 
                                   relativePath, seperator, offset, length, bufferSize, verbose)
  stopWithAzureError(resHttp)
  resRaw <- (content(resHttp, as="raw", type="gz", encoding = "UTF-8"))
  
  #Write a temporary file in binary mode from where you can unzip the data
  TempName<-tempfile(pattern = "", fileext = ".csv.gz")
  con <- file(TempName, "wb") 
  writeBin(resRaw, con)
  close(con)
  Data<-read.table(TempName, sep=seperator)
  return(Data)
}

MartheUT avatar May 08 '18 18:05 MartheUT