Damon P. Cortesi
Damon P. Cortesi
Hi @chrs-myrs - this can be an issue if you have a large number of source files that you're trying to convert. As a workaround, can you try setting `spark.driver.maxResultSize`...
That would be ideal, but [per the documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/LogFormat.html), certain fields (like `bytes_sent`) can be `-` if zero. So there needs to be some additional logic in the conversion script to...
Not really because in each field the `-` character can have a different meaning. So translating it to 0 can be a little misleading - no data is different than...
Yea, S3 access logs (given their age) are particularly challenging. re: conditionals, you can see something I did back in 2019 when there was an extra field briefly in the...
Hey @taschmidt You've got a good point here - easiest thing is probably just adding a line as you mentioned, but in reality the `glue_jobs.json` file should _probably_ just be...
I'm looking back into this again as noted in #23 Probably the part of this project that I was least happy with (but also kind of proud of 😆 )...
Thanks @CalvinLeather ! Much appreciated. I hope to be able to devote some time to this in the coming weeks.
Adding a note here about [blueprints](https://docs.aws.amazon.com/glue/latest/dg/blueprints-overview.html) - they could be useful for building more comprehensive Glue deployments for this project, specifically workflows which could schedule the jobs.
Looked into Blueprints a little bit yesterday. Looks like they could successfully be used to bootstrap Classifiers, Crawlers, and table definitions. In addition a schedule/trigger can also be set up...
@ryanrf-ac Hey there, thanks for chiming in - based on the error message from the original report, it looks like no data got converted in the initial job. > INFO:athena_glue_service_logs.converter:No...