parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Class Summary does not provide a getter to return inputSchema

Open asfimport opened this issue 10 years ago • 7 comments

In Pig code, https://github.com/apache/pig/blob/trunk/src/org/apache/pig/EvalFunc.java. A private number "inputSchemaInternal" represents the schema. Setter and Getter are also provided

316     private Schema inputSchemaInternal=null;

328     /**
329      * This method is for internal use. It is called by Pig core in both front-end
330      * and back-end to setup the right input schema for EvalFunc
331      */
332     public void setInputSchema(Schema input){
333         this.inputSchemaInternal=input;
334     }
335 
336     /**
337      * This method is intended to be called by the user in {@link EvalFunc} to get the input
338      * schema of the EvalFunc
339      */
340     public Schema getInputSchema(){
341         return this.inputSchemaInternal;
342     }

In parquet-mr/parquet-pig/src/main/java/parquet/pig/summary/Summary.java, class Summary extends EvalFunc. It uses a new number called inputSchema(vs. inputSchemaInternal used in class EvalFunc in Pig) to represent schema and override setInputSchema(), but the class does not override getInputSchema() to return inputSchema.

51  public class Summary extends EvalFunc<String> implements Algebraic {

54     private Schema inputSchema;

257   @Override
258   public void setInputSchema(Schema input) {
259     try {
260       // relation.bag.tuple
261       this.inputSchema=input.getField(0).schema.getField(0).schema;
262       saveSchemaToUDFContext();
263     } catch (FrontendException e) {
264       throw new RuntimeException("Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from " + input, e);
265     } catch (RuntimeException e) {
266       throw new RuntimeException("Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from "+input, e);
267     }
268   }

If setInputSchema() of class Summary is called, inputSchema is set. But if we call getInputSchema() afterwards, it will return the value of inputSchemaInternal, which can be still null.

Reporter: Xiang Li

Related issues:

Note: This issue was originally created as PARQUET-365. Please see the migration documentation for further details.

asfimport avatar Aug 26 '15 02:08 asfimport

Xiang Li: Providing a getting to return inputSchema can fix PARQUET-334 partially

asfimport avatar Aug 26 '15 02:08 asfimport

Xiang Li: https://github.com/apache/parquet-mr/pull/265

asfimport avatar Aug 26 '15 06:08 asfimport

Xiang Li: The patch is https://github.com/apache/parquet-mr/pull/265, please review.

asfimport avatar Aug 26 '15 06:08 asfimport

Julien Le Dem / @julienledem: I commented in the PR

asfimport avatar Sep 22 '15 17:09 asfimport

Xiang Li: Hi Julien, thanks for the comments! But would you please review the patch-1 of PARQUET-334 uploaded by Daniel ? If that patch can be accepted, we did not need to make any change in Summary any more.

asfimport avatar Sep 23 '15 09:09 asfimport

Ryan Blue / @rdblue: @julienledem, since you've looked at this already, can you comment on whether we should include a fix in 1.9.0?

asfimport avatar Nov 20 '15 23:11 asfimport

Thomas Friedrich / @tfriedr: Instead of this JIRA, PARQUET-334 should be considered because it makes this change obsolete. Currently there is a pull-request for PARQUET-334 up for review.

asfimport avatar Nov 30 '15 21:11 asfimport