spotz icon indicating copy to clipboard operation
spotz copied to clipboard

Refactor VW cache distribution

Open vsuthichai opened this issue 8 years ago • 1 comments

There's a slowdown with VW cache distribution during at the beginning of the Spark job. Refactor this logic to zip, and distribute the vw dataset to the executors before VW cache generation begins

vsuthichai avatar Apr 06 '17 18:04 vsuthichai

Local mode will do cache generation a single time only, unlike when executing over the cluster which requires cache generation on every node.

vsuthichai avatar Apr 06 '17 18:04 vsuthichai