Lee Prevost

Results 70 comments of Lee Prevost

It seems like the second use case could be handled within Influx continuous queries (ver . 1.8) or tasks (>2.0). Also, influx ver 2 apparently supports a store/forward/synch service with...

Did this go anywhere? I’m struggling with an issue where my spider crawls links from a link extractor that denies PDFs. But some links are redirected to a PDF on...

Thanks for the response. I think I am good on your bullet 1 and 3 within my scripts (yes, using pyspark). But, on item 2, I'm struggling with the following:...

Further, from AWS [docs](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html), --extra-jars The Amazon S3 paths to additional Java .jar files that AWS Glue adds to the Java classpath before executing your script. Multiple values must be...

So, even further reducing my question, I think all I need is to get some of these (which) on my s3 and add the extra jars path. Again, don't have...

Thank you. I was making it much harder than necessary. Am testing now and will report back.

Ok, I added these two parameters to my jobs definition: ``` '--extra-jars': "s3://aws-glue-assets-[my account num]-us-east-1/jars/", # path to the splittablegzip-1.3.jar file '--user-jars-first': "true", ``` I then added this to my...

OK, am reporting back that I commented out the changes above and script is running fine but with everything loaded on one executor ,not parallelization, and slow! So, something about...

Thinking about this some more: Am wondering if the last post on this thread on the spark jira is the answer: https://issues.apache.org/jira/browse/SPARK-29102 or, AWS glue has a capaqbility to install...

This looks promising. [so question](https://stackoverflow.com/a/66987359/7397195) again, I see I need extra jars with pointer to the jar file on s3. No problem there. But in the config statement, I can...