carml icon indicating copy to clipboard operation
carml copied to clipboard

Upgrade from 0.2.3 to 0.3.2 causing CSV Header issues

Open vallerap opened this issue 3 years ago • 4 comments

Hello,

I wasn't able to find any documentation about this so I guess it's an issue:

we are using a CsvResolver in our RmlMapper, which is configured as such:

`private Model executeCarmlRmlEngine(File fMappingRule) { Set<TriplesMap> mapping = RmlMappingLoader .build() .load(RDFFormat.TURTLE, Paths.get(fMappingRule.getPath()));

    RmlMapper mapper =
            RmlMapper
                    .newBuilder()
                    .setLogicalSourceResolver(Rdf.Ql.JsonPath, new JsonPathResolver())
                    .setLogicalSourceResolver(Rdf.Ql.XPath, new XPathResolver())
                    .setLogicalSourceResolver(Rdf.Ql.Csv, new CsvResolver())
                    .fileResolver(Paths.get("/"))
                    .addFunctions(new BasicFunctions()).addFunctions(new UtilFunctions())
                    .build();
    return mapper.map(mapping);

}`

We have a test case that tries to read a CSV file, specifically:

id, city, bus, latitude, longitude 6523, Brussels, 25, 50.901389, 4.484444 1234, Prague, 119, 50.1, 4.1

however, upon execution, this error is generated:

Caused by: java.lang.IllegalArgumentException: Header name 'id' not found. Available columns are: [id,, city,, bus,, latitude,, longitude] at com.univocity.parsers.common.record.RecordMetaDataImpl.getMetaData(RecordMetaDataImpl.java:50) at com.univocity.parsers.common.record.RecordMetaDataImpl.metadataOf(RecordMetaDataImpl.java:114) at com.univocity.parsers.common.record.RecordMetaDataImpl.getObjectValue(RecordMetaDataImpl.java:374) at com.univocity.parsers.common.record.RecordImpl.getString(RecordImpl.java:113) at com.taxonic.carml.logical_source_resolver.CsvResolver.lambda$null$0(CsvResolver.java:37) at com.taxonic.carml.engine.GetTemplateValue.lambda$bindTemplateExpression$1(GetTemplateValue.java:53) at com.taxonic.carml.engine.template.CarmlTemplate$Builder.getExpressionValue(CarmlTemplate.java:161) at com.taxonic.carml.engine.template.CarmlTemplate$Builder.getExpressionSegmentValue(CarmlTemplate.java:168) at com.taxonic.carml.engine.template.CarmlTemplate$Builder.create(CarmlTemplate.java:228) at com.taxonic.carml.engine.GetTemplateValue.apply(GetTemplateValue.java:42) at com.taxonic.carml.engine.GetTemplateValue.apply(GetTemplateValue.java:14) at com.taxonic.carml.engine.TermGeneratorCreator.lambda$null$17(TermGeneratorCreator.java:365) at com.taxonic.carml.engine.SubjectMapper.map(SubjectMapper.java:37) at com.taxonic.carml.engine.TriplesMapper.map(TriplesMapper.java:52) at com.taxonic.carml.engine.TriplesMapper.lambda$map$0(TriplesMapper.java:45) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at com.taxonic.carml.engine.TriplesMapper.map(TriplesMapper.java:45) at com.taxonic.carml.engine.RmlMapper.map(RmlMapper.java:434) at com.taxonic.carml.engine.RmlMapper.lambda$map$5(RmlMapper.java:362) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) at com.taxonic.carml.engine.RmlMapper.map(RmlMapper.java:362) at com.taxonic.carml.engine.RmlMapper.map(RmlMapper.java:335)

now, if we modify our file to not have spaces before/after commas in the first row like this:

id,city,bus,latitude,longitude 6523, Brussels, 25, 50.901389, 4.484444 1234, Prague, 119, 50.1, 4.1

Execution is not blocked by any exception...

Any ideas? Thank you in advance :)

vallerap avatar Mar 27 '23 13:03 vallerap

Hi @vallerap,

Sadly, this seems to be a bug in v2.9.0 of univocity parsers.

I tried locally to see if an upgrade to the latest v2.9.1 fixes your issue, and it does. However, I've never upgraded it in CARML because the upgrade causes another issue, which I raised over a year ago https://github.com/uniVocity/univocity-parsers/. Apparently univocity is not being actively maintained anymore so I'm afraid I will have to switch to another CSV library. I'll open an issue for that.

In the mean time, you could force the use of version 2.9.1 in your dependency management by adding it to your project:

<dependency>
    <groupId>com.univocity</groupId>
    <artifactId>univocity-parsers</artifactId>
    <version>2.9.1</version>
</dependency>

Hopefully that doesn't cause any new issues like the one mentioned.

By the way, I would strongly advise to upgrade to the newest version of CARML (0.4.7). It has many improvements over the 0.3.2 version. Let me know if you have any questions about that.

pmaria avatar Mar 27 '23 13:03 pmaria

Hi @pmaria , thanks for the quick response, the reason why we are considering to update only to 0.3.2 atm is because otherwise we would need to update the whole component to use rdf4j 4.X (still using 3.X), which is a bit of an effort.

What would be the top 2-3 improvements between 0.3.2 and 0.4.7 from your perspective?

tomas-knap avatar Mar 27 '23 14:03 tomas-knap

Hey @tomas-knap, long time :)

Since 0.4.0 we've switched to a reactive streams approach (see https://github.com/carml/carml/releases for details). Improvements are mostly:

  • speed
  • resource usage

pmaria avatar Mar 27 '23 14:03 pmaria

Thanks @pmaria, yes, that is right. Maybe we see each other again at Semantics this year! Anyway, all the best and thanks for the quick response again!

tomas-knap avatar Mar 28 '23 08:03 tomas-knap