Upgrade from 0.2.3 to 0.3.2 causing CSV Header issues
Hello,
I wasn't able to find any documentation about this so I guess it's an issue:
we are using a CsvResolver in our RmlMapper, which is configured as such:
`private Model executeCarmlRmlEngine(File fMappingRule) { Set<TriplesMap> mapping = RmlMappingLoader .build() .load(RDFFormat.TURTLE, Paths.get(fMappingRule.getPath()));
RmlMapper mapper =
RmlMapper
.newBuilder()
.setLogicalSourceResolver(Rdf.Ql.JsonPath, new JsonPathResolver())
.setLogicalSourceResolver(Rdf.Ql.XPath, new XPathResolver())
.setLogicalSourceResolver(Rdf.Ql.Csv, new CsvResolver())
.fileResolver(Paths.get("/"))
.addFunctions(new BasicFunctions()).addFunctions(new UtilFunctions())
.build();
return mapper.map(mapping);
}`
We have a test case that tries to read a CSV file, specifically:
id, city, bus, latitude, longitude 6523, Brussels, 25, 50.901389, 4.484444 1234, Prague, 119, 50.1, 4.1
however, upon execution, this error is generated:
Caused by: java.lang.IllegalArgumentException: Header name 'id' not found. Available columns are: [id,, city,, bus,, latitude,, longitude] at com.univocity.parsers.common.record.RecordMetaDataImpl.getMetaData(RecordMetaDataImpl.java:50) at com.univocity.parsers.common.record.RecordMetaDataImpl.metadataOf(RecordMetaDataImpl.java:114) at com.univocity.parsers.common.record.RecordMetaDataImpl.getObjectValue(RecordMetaDataImpl.java:374) at com.univocity.parsers.common.record.RecordImpl.getString(RecordImpl.java:113) at com.taxonic.carml.logical_source_resolver.CsvResolver.lambda$null$0(CsvResolver.java:37) at com.taxonic.carml.engine.GetTemplateValue.lambda$bindTemplateExpression$1(GetTemplateValue.java:53) at com.taxonic.carml.engine.template.CarmlTemplate$Builder.getExpressionValue(CarmlTemplate.java:161) at com.taxonic.carml.engine.template.CarmlTemplate$Builder.getExpressionSegmentValue(CarmlTemplate.java:168) at com.taxonic.carml.engine.template.CarmlTemplate$Builder.create(CarmlTemplate.java:228) at com.taxonic.carml.engine.GetTemplateValue.apply(GetTemplateValue.java:42) at com.taxonic.carml.engine.GetTemplateValue.apply(GetTemplateValue.java:14) at com.taxonic.carml.engine.TermGeneratorCreator.lambda$null$17(TermGeneratorCreator.java:365) at com.taxonic.carml.engine.SubjectMapper.map(SubjectMapper.java:37) at com.taxonic.carml.engine.TriplesMapper.map(TriplesMapper.java:52) at com.taxonic.carml.engine.TriplesMapper.lambda$map$0(TriplesMapper.java:45) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at com.taxonic.carml.engine.TriplesMapper.map(TriplesMapper.java:45) at com.taxonic.carml.engine.RmlMapper.map(RmlMapper.java:434) at com.taxonic.carml.engine.RmlMapper.lambda$map$5(RmlMapper.java:362) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) at com.taxonic.carml.engine.RmlMapper.map(RmlMapper.java:362) at com.taxonic.carml.engine.RmlMapper.map(RmlMapper.java:335)
now, if we modify our file to not have spaces before/after commas in the first row like this:
id,city,bus,latitude,longitude 6523, Brussels, 25, 50.901389, 4.484444 1234, Prague, 119, 50.1, 4.1
Execution is not blocked by any exception...
Any ideas? Thank you in advance :)
Hi @vallerap,
Sadly, this seems to be a bug in v2.9.0 of univocity parsers.
I tried locally to see if an upgrade to the latest v2.9.1 fixes your issue, and it does. However, I've never upgraded it in CARML because the upgrade causes another issue, which I raised over a year ago https://github.com/uniVocity/univocity-parsers/. Apparently univocity is not being actively maintained anymore so I'm afraid I will have to switch to another CSV library. I'll open an issue for that.
In the mean time, you could force the use of version 2.9.1 in your dependency management by adding it to your project:
<dependency>
<groupId>com.univocity</groupId>
<artifactId>univocity-parsers</artifactId>
<version>2.9.1</version>
</dependency>
Hopefully that doesn't cause any new issues like the one mentioned.
By the way, I would strongly advise to upgrade to the newest version of CARML (0.4.7). It has many improvements over the 0.3.2 version. Let me know if you have any questions about that.
Hi @pmaria , thanks for the quick response, the reason why we are considering to update only to 0.3.2 atm is because otherwise we would need to update the whole component to use rdf4j 4.X (still using 3.X), which is a bit of an effort.
What would be the top 2-3 improvements between 0.3.2 and 0.4.7 from your perspective?
Hey @tomas-knap, long time :)
Since 0.4.0 we've switched to a reactive streams approach (see https://github.com/carml/carml/releases for details). Improvements are mostly:
- speed
- resource usage
Thanks @pmaria, yes, that is right. Maybe we see each other again at Semantics this year! Anyway, all the best and thanks for the quick response again!