aws-serverless-data-lake-framework
aws-serverless-data-lake-framework copied to clipboard
small change to legislators to support NDJSON
Dear SDLF Team,
I recently bumped in SDLF and I love the concept and the architecture. I have 3 items about the "Testing the Framework" section of the Serverless Data Lake Workshop.
-
The provided code in the legislators pipeline, expects input files to be a fully valid JSON files, that is an array of dictionaries, while the original example from AWS Glue documentation (https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html ) deals with NDJSON, that is json objects on each line (new-line being the separator, no comma separator. In real world NDJSON is the most common case. For example, AWS Connect log files are NDJSON.
-
One of the input file in ./sdlf-utils/pipeline-examples/legislators/data more precisely regions.json is corrupted and I understand. this is by design. This is explained in closed issue #28. However I think this should be more explicitly explained in a README file associated with the example.
-
Can I do a pull request to modify the legislators example to handle NDJSON files and figure out what type of JSON is? It looks like this project is somehow inactive, I would like to be active in it and also make some additions such as more examples and a UI in the near future.
Thank you, -Marian