pudl Make SEC 10K metadata into a structured database

Make SEC 10K metadata into a structured database

Open katie-lamb opened this issue 1 year ago • 0 comments

One of the lessons from previous unstructured data extraction projects is the necessity to collect as much metadata as possible, and be able to go from a structured database of metadata to a specific document. Also, we should be able to add our own columns onto the metadata during the extraction process.

We have filer’s name and CIK (unique ID that SEC uses) and all the metadata is currently being saved in the archives as a CSV.

[ ] Create a central database and dump the 10Ks into a cloud bucket
[ ] The metadata seems consistent between all documents, so create a structured table of metadata that we can add columns are
[ ] Set up a way to associate a parent company with its filing and peruse random files to figure out what’s going on

Feb 26 '24 18:02 katie-lamb

pudl pudl copied to clipboard

Make SEC 10K metadata into a structured database

pudl
pudl copied to clipboard