auto-rag-eval
auto-rag-eval copied to clipboard
StackExchange and SecFillings both pulling StackExchange data?
trafficstars
The Data preprocessor.py is almost the same except for some comments and the object class name. Diff provided below:
--- Data/StackExchange/preprocessor.py 2024-07-04 12:42:46
+++ Data/SecFilings/preprocessor.py 2024-07-04 12:42:46
@@ -12,10 +12,32 @@
ROOTPATH = dirname(dirname(abspath(__file__)))
-# Please use first the HuggingFace script at https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences to get the data
+# SEC TEMPLATE
-class StackExchangeData:
+# [KEEP] 0.Business: Overview of the company's main operations, including its products or services.
+# [KEEP] 1.Risk Factors: Discussion of risks and challenges the company faces.
+# [REMOVE] 2.Unresolved Staff Comments: Comments by SEC staff on the company's previous filings that haven't been resolved.
+# [REMOVE] 3.Properties: Information about the company's physical properties (like real estate).
+# [REMOVE] 4.Legal Proceedings: Information on any significant legal actions involving the company.
+# [REMOVE] 5.Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities: Details about the company’s stock, including dividends, the number of shareholders, and any buyback programs.
+# [REMOVE] 6.Selected Financial Data: Summary of specific financial data for a five-year period.
+
+# [KEEP] 8.Management’s Discussion and Analysis of Financial Condition and Results of Operations (MD&A): A detailed analysis from management’s perspective on the company’s financials and operations.
+# [REMOVE] 9.Quantitative and Qualitative Disclosures About Market Risk: Information on market risk, such as foreign exchange risk, interest rate risk, etc.
+# [REMOVE] 1.Financial Statements and Supplementary Data: Complete financial statements including balance sheets, income statements, and cash flow statements.
+# [REMOVE] 11.Changes in and Disagreements with Accountants on Accounting and Financial Disclosure: If there have been changes or disagreements with accountants, this section provides details.
+# [REMOVE] 12.Directors, Executive Officers and Corporate Governance: Information about the company’s directors and high-level executives.
+# [REMOVE] 13.Executive Compensation: Detailed information about the compensation of top executives.
+# [REMOVE] 14.Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters: Details about the shares held by major shareholders and company executives.
+# [REMOVE] 15.Certain Relationships and Related Transactions, and Director Independence: Information about any transactions between the company and its directors or executives.
+# [REMOVE] 16.Principal Accountant Fees and Services: Fees and services provided by the company's accountants.
+# [REMOVE] 17.Exhibits, Financial Statement Schedules: Lists all the exhibits and financial statements schedules.
+# [REMOVE] 18.Form 10-K Summary: Summary of the key information from the 10-K (optional).
+# [REMOVE] 19. [OPTIONAl] CEO and CFO Certifications: As required by the Sarbanes-Oxley Act, certifications by the CEO and CFO regarding the accuracy of the financial statements.
+
+class ExchangeData:
+
def __init__(self,
n_samples: int,
max_char_length: int):
@@ -115,8 +137,7 @@
if __name__ == "__main__":
- stack_exchange_data = StackExchangeData(
- n_samples=400,
- max_char_length=1500)
+ stack_exchange_data = ExchangeData(n_samples=400,
+ max_char_length=1500)
stack_exchange_data.load_save_dataset()