sec-api
sec-api copied to clipboard
Replicating the income statement as it appears on the SEC form without extra revenue segments being pulled in
I am trying to replicate the income statement as shown in the company's 10-K (or 10-Q) so I can perform segment analysis. For example, for Tesla (found here) we want
where we can see revenue is broken out by segment. However, it is difficult (if not impossible to) recreate this table without extra data being pull in. My code as follows will demonstrate this.
import pandas as pd
url_10k_tsla = 'https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm'
xbrl_tsla_json = xbrlApi.xbrl_to_json(htm_url=url_10k_tsla)
# Initialize an empty DataFrame
df = pd.DataFrame(columns=['period'])
# Iterate through each key in the JSON data
for key, values in xbrl_tsla_json['StatementsOfIncome'].items():
# See if there are segments first
has_segments = False
for value in values:
if 'segment' in value:
has_segments = True
for value in values:
# Extract period and value
period = value['period']['startDate'] + ' - ' + value['period']['endDate']
# if there are segments
if has_segments:
# Get the value out of the segment or call it total.
segment_name = value.get('segment', {}).get('value', 'Total').split(':')[-1]
value_key = key + ' - ' + segment_name
# There are no segments
else:
segment_name = value.get('value')
value_key = key
# Check if period already exists in DataFrame
if period in df['period'].values:
# Update existing row
df.loc[df['period'] == period, value_key] = value['value']
else:
# Add new row
row = {'period': period}
row[value_key] = value['value']
df = pd.concat([df, pd.DataFrame([row])], ignore_index=True)
df.set_index('period', inplace=True)
# convert all DataFrame columns to the int64 dtype
df = df[df.columns[~df.columns.isin(['EarningsPerShareBasic', 'EarningsPerShareDiluted'])]].astype(np.float64)
# Round all columns to millions except the EarningsPerShareBasic and EarningsPerShareDiluted
df = df[df.columns[~df.columns.isin(['EarningsPerShareBasic', 'EarningsPerShareDiluted'])]].divide(1_000_000)
df.T
will produce the following pandas table:
In particular, the table lists two extra revenue sources, RevenueFromContractWithCustomerExcludingAssessedTax - EnergyGenerationAndStorageSalesMember and RevenueFromContractWithCustomerExcludingAssessedTax - SalesAndServicesMember as well as two extra gross profits (although I am not as concerned with these). These two extra revenue sources come from the 'Revenue Recognition - Revenue by source' table which is separate from the income statement. Is there a way to keep these extra revenue sources from being pulled in (without hard coding their names since I want to apply this technique to many companies) so that my code replicates exactly the income statement as it appears in the filing?