white-elephant icon indicating copy to clipboard operation
white-elephant copied to clipboard

- parallelize the uploads

Open mnikhil-git opened this issue 11 years ago • 0 comments

  • sequential upload for a day takes about a day; parallel uploads to make the runtime short
  • add more command line options for -- number of parallel uploads -- number of days for which logs need to be uploaded for -- list of queues for which logs to exclusively process and upload
  • use hadoop fs instead of hadoop dfs as its deprecated now
  • use a better way of pid file locking (added File::Pid perl module to the library)
  • fix getting right queue name from job configuration using xml parser
  • add logging to a file capability

mnikhil-git avatar Feb 11 '14 19:02 mnikhil-git