sparkler issues

Writing Data to Elasticsearch Storage Engine

1

#### Task Description This is a task that is currently being worked on in order to provide Elasticsearch as a backend storage engine option for Sparkler. This builds upon the...

Kefaun2601

Fix sparkler CI build

The [CI build if failing](https://github.com/USCDataScience/sparkler/runs/2160833582?check_suite_focus=true) ``` #10 35.54 Collecting scipy==1.4.1 #10 35.56 Downloading scipy-1.4.1.tar.gz (24.6 MB) #10 38.26 Installing build dependencies: started #10 104.0 Installing build dependencies: still running... #10...

lewismc

Investigate pipeline frameworks

1

Would something like Apache Beam, be a more modern way of doing the same Spark stuff but in an agnostic fashion? This would allow us to be less dependant on...

buggtb

Elasticsearch for Sparkler - Maven Profiles

As part of the Elasticsearch for Sparkler set of issues, @lewismc requested the team create separate maven profiles for the solr and elasticsearch dependencies so we don't pull in unnecessary...

KilometersFan

Support for flexible focus language crawling framework

3

The first task is defining and expressing the **forcus crawling** specification. The second subtask will be implementing that specification in sparkler. Currently, we have support for URL based focus/filters. this...

thammegowda

Extractor of fields using xpath or css selectors and map them to Solr fields?

3

First, thanks for the project. Sounds great. I am wondering if there is any chance to extract particular text items and images from web pages and map these extracted fields...

mzeidhassan

enhancement

Discussion

volunteer wanted

URL validator used in injector is too strict

## Background: Injector uses a URLValidator utility to validate urls before injection ## Problem URL validator used in injector is too strict, often times not passing valid urls. Example: we...

thammegowda

Revamp Sparkler Config - Type checking + Validation + *-site.yaml overriding

6

There are two FIXME: in configuration: First, support loading `sparkler-defaults.yaml` and `sparkler-site.yaml`. The common practice is `*-default.yaml` provides default and recommended values from developers. The `*-site.yaml` should beused by users...

thammegowda

Finish Juju charm

3

Some high level remaining tasks: - [x] Add solr relation - [x] Pick up spark details from relation - [x] Pick up solr details from relation - [x] Finish write...

buggtb

[MEMEX] Add extractor plugin interface

+ review if this can be generalised as `Parser` + Generalise schema to fit all possible extractions that may come up in the future

thammegowda

sparkler
sparkler copied to clipboard

Metadata

Writing Data to Elasticsearch Storage Engine

Fix sparkler CI build

Investigate pipeline frameworks

Elasticsearch for Sparkler - Maven Profiles

Support for flexible focus language crawling framework

Extractor of fields using xpath or css selectors and map them to Solr fields?

URL validator used in injector is too strict

Revamp Sparkler Config - Type checking + Validation + *-site.yaml overriding

Finish Juju charm

[MEMEX] Add extractor plugin interface

← Metadata

Owner

Metadata

sparkler sparkler copied to clipboard

Metadata

← Metadata

Owner

Metadata

sparkler
sparkler copied to clipboard