Damon P. Cortesi

Results 117 comments of Damon P. Cortesi

Hi @chrs-myrs - this can be an issue if you have a large number of source files that you're trying to convert. As a workaround, can you try setting `spark.driver.maxResultSize`...

That would be ideal, but [per the documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/LogFormat.html), certain fields (like `bytes_sent`) can be `-` if zero. So there needs to be some additional logic in the conversion script to...

Not really because in each field the `-` character can have a different meaning. So translating it to 0 can be a little misleading - no data is different than...

Yea, S3 access logs (given their age) are particularly challenging. re: conditionals, you can see something I did back in 2019 when there was an extra field briefly in the...

Hey @taschmidt You've got a good point here - easiest thing is probably just adding a line as you mentioned, but in reality the `glue_jobs.json` file should _probably_ just be...

I'm looking back into this again as noted in #23 Probably the part of this project that I was least happy with (but also kind of proud of 😆 )...

Thanks @CalvinLeather ! Much appreciated. I hope to be able to devote some time to this in the coming weeks.

Adding a note here about [blueprints](https://docs.aws.amazon.com/glue/latest/dg/blueprints-overview.html) - they could be useful for building more comprehensive Glue deployments for this project, specifically workflows which could schedule the jobs.

Looked into Blueprints a little bit yesterday. Looks like they could successfully be used to bootstrap Classifiers, Crawlers, and table definitions. In addition a schedule/trigger can also be set up...

@ryanrf-ac Hey there, thanks for chiming in - based on the error message from the original report, it looks like no data got converted in the initial job. > INFO:athena_glue_service_logs.converter:No...