texera
texera copied to clipboard
Supporting Broadcast data transfer strategy for Python UDF model
The 2-input Python UDF operator takes two inputs: "model" and "data". The "model" is one single tuple is generated from a source file with a single worker.
Without broadcast data transfer strategy, the python UDF cannot be parallelized, because the tuple won't be distributed to all workers with other strategies (hash partition, round-robin, etc..)
To implement the broadcast data transfer strategy, we can let the Python UDF operator specify the partioning as a requirement on each input port