extraction-framework
extraction-framework copied to clipboard
The software used to extract structured data from Wikipedia
Bumps [com.fasterxml.jackson.core:jackson-core](https://github.com/FasterXML/jackson-core) from 2.5.0 to 2.15.0. Changelog Sourced from com.fasterxml.jackson.core:jackson-core's changelog. #release configuration #Sun Apr 23 14:19:10 PDT 2023 scm.commentPrefix=[maven-release-plugin] exec.pomFileName=pom.xml pushChanges=false releaseStrategyId=default project.dev.com.fasterxml.jackson.core:jackson-core=2.15.1-SNAPSHOT project.scm.com.fasterxml.jackson.core:jackson-core.connection=scm:git:[email protected]:FasterXML/jackson-core.git scm.tag=jackson-core-2.15.0 remoteTagging=true project.scm.com.fasterxml.jackson.core:jackson-core.developerConnection=scm:git:[email protected]:FasterXML/jackson-core.git exec.additionalArguments=-Prelease scm.branchCommitComment=@{prefix}...
## Description `GenderExtractor.scala` currently hardcodes URI strings for ontology properties and classes instead of leveraging the framework's ontology lookup mechanism. This creates maintenance issues and inconsistencies across extractors. ## Current...
Bumps `spark.version` from 2.2.1 to 2.4.8. Updates `org.apache.spark:spark-core_2.11` from 2.2.1 to 2.4.8 Updates `org.apache.spark:spark-sql_2.11` from 2.2.1 to 2.4.8 Dependabot will resolve any conflicts with this PR as long as you...
### `name` = `font-size:%;` Performing the next query to dbpedia: ```sparql PREFIX dbo: PREFIX dbr: PREFIX foaf: SELECT ?country ?label ?longName ?name WHERE { ?country a dbo:Country. ?country dbo:capital ?capital....
# Issue validity Examples contain '\n' in URIs: - [Berlin](https://dief.tools.dbpedia.org/server/extraction/de/extract?title=Berlin&revid=&format=turtle-triples&extractors=custom) - [Siemens](https://dief.tools.dbpedia.org/server/extraction/de/extract?title=Siemens&revid=&format=turtle-triples&extractors=custom) # Error Description The extracted triples contain URIs where '\n' (escaped newline characters) appear within the URI string,...
Updated the `fromLines()` and `fromLine()` methods of `WikiInfo.scala` for proper parsing of `wikipedias.csv`. ## Summary by CodeRabbit * **Chores** * Added CSV processing library to project dependencies * **Bug Fixes**...
## Description ## Motivation and Context ## How Has This Been Tested? ## Screenshots (if appropriate): ## Types of changes - [ ] Bug fix (non-breaking change which fixes an...
Bumps [junit](https://github.com/junit-team/junit4) from 4.12 to 4.13.1. Release notes Sourced from junit's releases. JUnit 4.13.1 Please refer to the release notes for details. JUnit 4.13 Please refer to the release notes...
Most of the LinkParserTest we failing and this PR fixes most of these tests. There are still issues with external links with paths (containing `?`) this is due to the...
Implemented first version of Wikimedia Commons Infobox Extractor. I also configured the properties file which contains extractors that are used for data extraction from Wikimedia Commons files.