spring-ai icon indicating copy to clipboard operation
spring-ai copied to clipboard

Default metadata field names in PagePdfDocumentReader are can't be parsed in a filter expression

Open markpollack opened this issue 1 year ago • 0 comments
trafficstars

The field name file_name is not compatible with the filter expression parsing.

		SearchRequest searchRequest = SearchRequest.defaults()
				.withTopK(4)
				.withFilterExpression(PagePdfDocumentReader.METADATA_FILE_NAME + " == 'medicaid-wa-faqs.pdf'");

where `public static final String METADATA_FILE_NAME = "file_name"

throws the exception

Caused by: org.antlr.v4.runtime.NoViableAltException: null
	at org.antlr.v4.runtime.atn.ParserATNSimulator.noViableAlt(ParserATNSimulator.java:2014) ~[antlr4-runtime-4.13.1.jar:4.13.1]
	at org.antlr.v4.runtime.atn.ParserATNSimulator.execATN(ParserATNSimulator.java:445) ~[antlr4-runtime-4.13.1.jar:4.13.1]
	at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:371) ~[antlr4-runtime-4.13.1.jar:4.13.1]
	at org.springframework.ai.vectorstore.filter.antlr4.FiltersParser.booleanExpression(FiltersParser.java:556) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
	at org.springframework.ai.vectorstore.filter.antlr4.FiltersParser.where(FiltersParser.java:199) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
	at org.springframework.ai.vectorstore.filter.FilterExpressionTextParser.parse(FilterExpressionTextParser.java:147) ~[spring-ai-core-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
	... 46 common frames omitted

Underscore seems to be the issue. Suggest we change to use camel case for document readers that add metadata fields.

markpollack avatar May 08 '24 17:05 markpollack