IPED icon indicating copy to clipboard operation
IPED copied to clipboard

Fixes the content type in the elasticsearch index

Open hauck-jvsh opened this issue 2 years ago • 5 comments

The index field content should have a fixed type, as it is a well known field. When leave to the automatic detection it creates a keyword value that is not necessary for the content field. Another advantage is that it can be set to index with term vectors which allows the use of fast vector highlighter.

hauck-jvsh avatar Jul 11 '22 18:07 hauck-jvsh

The index field content should have a fixed type, as it is a well known field

Agreed.

Another advantage is that it can be set to index with term vectors which allows the use of fast vector highlighter.

This could make the index much bigger, I think we should do some tests to measure the impact on index size before making this change.

lfcnassif avatar Jul 11 '22 19:07 lfcnassif

This could make the index much bigger, I think we should do some tests to measure the impact on index size before making this change.

For sure, I will make some test to see the impact

hauck-jvsh avatar Jul 11 '22 19:07 hauck-jvsh

I think the offsets in with_positions_offsets also has another index size impact independent of term_vectors. Is that needed?

lfcnassif avatar Jul 11 '22 19:07 lfcnassif

Unfortunately, for using the fast vector highlighter the documentation says it is needed. So I vote for using with_positions_offsets or nothing, or may be this can be a parameter in the elasticsearch config file.

hauck-jvsh avatar Jul 11 '22 19:07 hauck-jvsh

I see, into lucene that could be set independently...

may be this can be a parameter in the elasticsearch config file.

This is a good option!

For sure, I will make some test to see the impact

This would still be very useful to support the decision.

lfcnassif avatar Jul 11 '22 19:07 lfcnassif