Grant Ingersoll

Results 27 issues of Grant Ingersoll

I'm trying Common Crawl w/ Hadoop 0.20.205 and I'm getting the following: Exception in thread "main" java.lang.VerifyError: (class: org/commoncrawl/hadoop/io/JetS3tARCSource, method: configureImpl signature: (Lorg/apache/hadoop/mapred/JobConf;)V) Incompatible argument to function at java.lang.Class.forName0(Native Method)...

It would be really helpful to use Apache Commons CLI for command line processing and then to try to standardize the names of input/output arguments, etc.

From https://relevancy.slack.com/archives/C47DYV6AW/p1642636695016000 The XGBBoostJsonParser fails to parse XGBoost models that have binary (or what XGB calls indicator features) nodes in the tree due to lacking the threshold/split condition attribute. I...

I frequently use the Apache Soft. Foundation JIRA and it's REST endpoint is at https://issues.apache.org/jira/rest/api/ AFAICT, there is no way to construct JiraApi to match this since the basepath is...

Create web crawlers in the setup to crawl company sites to bring in more info

Solr's faceting supports paging. See https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.offsetParameter

We should be able to control the order the facets are displayed by first inspecting from the pipeline the desired order and then following back to an order specified in...

Since the Fusion JIRA connector can only crawl an entire JIRA site, we need a JIRA pipeline that drops documents from projects that we are not interested in.

enhancement