white-elephant
white-elephant copied to clipboard
- parallelize the uploads
- sequential upload for a day takes about a day; parallel uploads to make the runtime short
- add more command line options for -- number of parallel uploads -- number of days for which logs need to be uploaded for -- list of queues for which logs to exclusively process and upload
- use hadoop fs instead of hadoop dfs as its deprecated now
- use a better way of pid file locking (added File::Pid perl module to the library)
- fix getting right queue name from job configuration using xml parser
- add logging to a file capability