fix: detect 'Date' header dynamically and handle 'Change (%)' parsing
Background & Motivation Until August 2022, the PSX historical-data table used a column header labelled TIME for the trading date. In a recent site update, PSX renamed that column to Date and also added two new fields—Change and Change (%). Because psx-data-reader hard-coded .set_index("TIME") and treated all columns uniformly, any attempt to fetch live data now results in:
KeyError: "None of ['TIME'] are in the columns" And the new % symbol in “Change (%)” would have caused a later type-casting error if the first issue were fixed.
What This PR Does Dynamic Date-Column Detection
Replaces the hard-coded .set_index("TIME") with logic that picks whichever header is present (“time” or “date”).
Whitespace and Row-Length Safeguards
Strips leading/trailing whitespace from headers and cells.
Skips any table rows whose
Proper Handling of “Change (%)”
Strips out the % sign (and any commas) before converting that column to float.
No Behavioral Change Elsewhere
All threading, monthly chunking, HTML parsing, DataFrame concatenation, and numeric-casting logic remains untouched aside from the improvements above.
How I Tested Ran the existing example script locally:
python main.py against tickers “SILK” and “PACE” from Jan 1, 2024 through today—confirmed:
No KeyError on date indexing
Correct DataFrame columns: Open, High, Low, Close, Change, Change (%), Volume
All columns typed as float64
Sliced, plotted, and exported the returned DataFrame to CSV to verify numeric correctness.
Impact & Compatibility Backward-compatible: still works if the PSX site reverts to using "TIME".
Forward-compatible: new columns like Change and Change (%) are now parsed without error.
Minimal footprint: patch affects only two methods (toframe() and preprocess()), so the risk of side-effects is very low.