Class Summary does not provide a getter to return inputSchema
In Pig code, https://github.com/apache/pig/blob/trunk/src/org/apache/pig/EvalFunc.java. A private number "inputSchemaInternal" represents the schema. Setter and Getter are also provided
316 private Schema inputSchemaInternal=null;
328 /**
329 * This method is for internal use. It is called by Pig core in both front-end
330 * and back-end to setup the right input schema for EvalFunc
331 */
332 public void setInputSchema(Schema input){
333 this.inputSchemaInternal=input;
334 }
335
336 /**
337 * This method is intended to be called by the user in {@link EvalFunc} to get the input
338 * schema of the EvalFunc
339 */
340 public Schema getInputSchema(){
341 return this.inputSchemaInternal;
342 }
In parquet-mr/parquet-pig/src/main/java/parquet/pig/summary/Summary.java, class Summary extends EvalFunc. It uses a new number called inputSchema(vs. inputSchemaInternal used in class EvalFunc in Pig) to represent schema and override setInputSchema(), but the class does not override getInputSchema() to return inputSchema.
51 public class Summary extends EvalFunc<String> implements Algebraic {
54 private Schema inputSchema;
257 @Override
258 public void setInputSchema(Schema input) {
259 try {
260 // relation.bag.tuple
261 this.inputSchema=input.getField(0).schema.getField(0).schema;
262 saveSchemaToUDFContext();
263 } catch (FrontendException e) {
264 throw new RuntimeException("Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from " + input, e);
265 } catch (RuntimeException e) {
266 throw new RuntimeException("Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from "+input, e);
267 }
268 }
If setInputSchema() of class Summary is called, inputSchema is set. But if we call getInputSchema() afterwards, it will return the value of inputSchemaInternal, which can be still null.
Reporter: Xiang Li
Related issues:
Note: This issue was originally created as PARQUET-365. Please see the migration documentation for further details.
Xiang Li: Providing a getting to return inputSchema can fix PARQUET-334 partially
Xiang Li: https://github.com/apache/parquet-mr/pull/265
Xiang Li: The patch is https://github.com/apache/parquet-mr/pull/265, please review.
Julien Le Dem / @julienledem: I commented in the PR
Xiang Li: Hi Julien, thanks for the comments! But would you please review the patch-1 of PARQUET-334 uploaded by Daniel ? If that patch can be accepted, we did not need to make any change in Summary any more.
Ryan Blue / @rdblue: @julienledem, since you've looked at this already, can you comment on whether we should include a fix in 1.9.0?
Thomas Friedrich / @tfriedr: Instead of this JIRA, PARQUET-334 should be considered because it makes this change obsolete. Currently there is a pull-request for PARQUET-334 up for review.