jackson-dataformat-csv icon indicating copy to clipboard operation
jackson-dataformat-csv copied to clipboard

withHeader() not obeying filters

Open HenrikHL opened this issue 9 years ago • 9 comments

I am using filters for including/excluding columns which is working fine. My only problem is that the headers of the columns are not obeying the filters.

I have created a SimpleBeanPropertyFilter where I have implemented the include function:

SimpleBeanPropertyFilter sbpf = new SimpleBeanPropertyFilter() {
  @Override
  protected boolean include(PropertyWriter propertyWriter) {
    return "name".equals(propertyWriter.getName());
  }
}

The above filter should only include columns that have the header "name". The filter is added to the writer before serialization:

objectWriter = objectWriter.with(filters)
String result = objectWriter.writeValueAsString(pojo);

For the data this works well - I only get data from the name column. But all columns are mentioned :-( The result string above would look something like this:

name,address,age,...
"John",,,...
"Simon",,,...

The expected result should be:

name
"John"
"Simon"

I hope the above makes sense? (This worked fine in version 2.4.4)

HenrikHL avatar Jan 08 '15 11:01 HenrikHL

Actually, due to CSV structure, I think the correct thing to do is indeed not to filter headers; and as to column values, one must leave a placeholder (empty String). I don't know why 2.4.x did not work that way, but I do not think it is correct to filter names.

The reason I don't think it makes sense to drop columns altogether is that there is no guarantee that different rows might not drop different columsn; and for positional documents this is a problem.

I would be open to having a setting to control this behavior, however, so that users could force "full removal" of all column values. This way users that can make sure things work expected could do that, but by default safe mode would be used.

cowtowncoder avatar Jan 08 '15 18:01 cowtowncoder

I am not quite sure why "there is no guarantee that different rows might not drop different coumns"? Internally all columns should be handled - it is just a matter of what the writer "writes"...? Instead of making a "," after each column and empty string should be inserted: "". The same goes for the header.

If you implement this using a setting that is fine with me. I guess the setting would be: setFullColumnRemoval(boolean visible) or something similar...

HenrikHL avatar Jan 08 '15 21:01 HenrikHL

I mean simply that mapping from logical names (used by Jackson internally) to physical column position must be stable. At point where columns are written, one either has to keep track of dynamic position of headers, if filtering is allowed for headers; and later on when writing rows, carefully try to match and see what gets added where.

So perhaps another way to say this is that if headers or cell values are to be omitted (instead of being left empty), much more functionality must be added to handle mapping between logical property (column) name and actual position. Further, header names are metadata, somewhat different from actual column data being written.

Come to think of that now, it may not even be possible to apply filtering at all. Filtering occurs at databind level, and writing of header names is handled by CSV module itself, exactly because they are not data to write.

So let's go back to the higher level problem you are having: I probably should have tried to understand the problem first before suggesting solutions (or excluding ones). :-)

It sounds like you wanted to filter out whole columns of data. Doing this, with the processing model Jackson uses, would require filtering checks to be applied at the beginning of document. It may require somewhat different handling, not just because of CSV processing, but also due to way BeanPropertyFilter API is defined -- in this case, inclusion/exclusion is solely based on column name (and perhaps optional type from CsvSchema), but not on value. If so, perhaps there should be a way to add column-filtering to CsvWriter. It could use one of existing filter objects, if applicable, but attached with new call(s).

cowtowncoder avatar Jan 08 '15 22:01 cowtowncoder

I have looked at how writing is implemented and I do see that it is complex to filter...

Sounds plausible to create a CsvWriter class to make the filtering

HenrikHL avatar Jan 09 '15 10:01 HenrikHL

I would like to add my vote for resolving this issue, as ignoring a property/column using annotations also does not work as expected for the same reason I am sure. I feel this is a significant shortcoming in the CSV package.

When you @JsonIgnore a property for serializing to Json, the property is left out of the Json entirely - So yes, one would expect the CSV column to be left out too when serializing to CSV. When you @JsonIgnore a property for serializing to CSV, the column is present, but empty. Likewise, the column header names are always present for ignored properties. It would be great if ignored property columns were completely filtered out of the serialized output. Thanks!

wheresmybrain avatar Dec 21 '16 05:12 wheresmybrain

@wheresmybrain I am not sure why you state that:

 as ignoring a property/column using annotations also does not work as expected for the same reason I am sure.

since handling of JsonFilter is very different from handling of static annotations. As far as I know basic ignoral works just fine; the problem comes from more dynamic nature of JsonFilter handling. If you do have a failing case for such usage I would be interested in a unit test that reproduces the problem.

cowtowncoder avatar Jan 10 '17 02:01 cowtowncoder

@HenrikHL Restating my earlier explanation, I think the only way to support actual removal of columns (including removing from header) would require either:

  1. Buffering every row, to determine set of columns for which values are written, and only then writing headers, OR
  2. Using heuristic that whatever columns first row has is the set to use.

Come to think of that, (2) might be feasible, if (and only if) existence of filter can be determined before writing column headers; if so, writing of said headers need to be postponed until output of the first data row is complete. It would actually require buffering of that row altogether I think.

cowtowncoder avatar Jan 10 '17 02:01 cowtowncoder

I am having the same issue. In our project, we are using the same bean and use view to control the properties exported for different API. For json, it works fine. But for csv, it still outputs the header as well as the empty cell. The point for view is to hide the unnecessary properties for different use case. Expose all property names in the csv file is really not a good solution. Is there any workaround I could do to just prevent exporting those properties shadowed by view? The bean itself contains some annotation on the property that describes how to serialize it and whether to display it (like view settings), I really don't want to repeat those descriptions/settings somewhere else. That's the reason I hope to use jackson's csv formatter. Otherwise I will have to switch to other csv solutions.

jealous avatar Aug 22 '17 10:08 jealous

@jealous This issue is related to @JsonFilter, not views. So it is separate question, and warrants a separate issue.

cowtowncoder avatar Aug 23 '17 23:08 cowtowncoder