pudl
pudl copied to clipboard
Make SEC 10K metadata into a structured database
One of the lessons from previous unstructured data extraction projects is the necessity to collect as much metadata as possible, and be able to go from a structured database of metadata to a specific document. Also, we should be able to add our own columns onto the metadata during the extraction process.
We have filer’s name and CIK (unique ID that SEC uses) and all the metadata is currently being saved in the archives as a CSV.
- [ ] Create a central database and dump the 10Ks into a cloud bucket
- [ ] The metadata seems consistent between all documents, so create a structured table of metadata that we can add columns are
- [ ] Set up a way to associate a parent company with its filing and peruse random files to figure out what’s going on