Przemysław Jacewicz

Results 23 issues of Przemysław Jacewicz

This PR is a followup after merging PR #940 into a new base branch.

This PR is a followup after merging PR #1011 into a new base branch.

Upgrading Spark version to 2.4.0 resulted in adding specific parameters for Oozie `spark` actions allowing jobs to run with upgraded Spark. Adding the parameters was forced by the fact that...

Running patent metadata retriever and cached webcrawler jobs with empty cache resulted in many faults with `org.apache.http.conn.ConnectionPoolTimeoutException`. This exception is thrown when the waiting time for a connection from connection...

We should run cached webcrawler job with empty cache and analyze the output after merging #1233 and #1234 . The analysis should focus on finding what type of server responses...

`CachedWebCrawlerJob` creates output file with `Fault`records corresponding to failed content retrieval (transient and persistent). It seems that this file is not used in IIS - `SoftwareExporterJob` only uses document to...

Ticket #1221 will remove writing of empty records to webcrawler and patent metadata retriever caches. We should also remove already existing empty records in caches to avoid their propagation to...

We should reduce technical debt of IIS by fixing code smells found by Sonar.

Currently some IIS spark jobs have a parameter for setting the number of output files. This functionality was recently revised when we introduced an explicit repartition step when creating datastores...

activity: explore

This is a follow-up after #1278 .