fix: detect 'Date' header dynamically and handle 'Change (%)' parsing

Open zohaibAlam840 opened this issue 9 months ago • 0 comments

Background & Motivation Until August 2022, the PSX historical-data table used a column header labelled TIME for the trading date. In a recent site update, PSX renamed that column to Date and also added two new fields—Change and Change (%). Because psx-data-reader hard-coded .set_index("TIME") and treated all columns uniformly, any attempt to fetch live data now results in:

KeyError: "None of ['TIME'] are in the columns" And the new % symbol in “Change (%)” would have caused a later type-casting error if the first issue were fixed.

What This PR Does Dynamic Date-Column Detection

Replaces the hard-coded .set_index("TIME") with logic that picks whichever header is present (“time” or “date”).

Whitespace and Row-Length Safeguards

Strips leading/trailing whitespace from headers and cells.

Skips any table rows whose

count doesn’t match the count, preventing mis-aligned data.

Proper Handling of “Change (%)”

Strips out the % sign (and any commas) before converting that column to float.

No Behavioral Change Elsewhere

All threading, monthly chunking, HTML parsing, DataFrame concatenation, and numeric-casting logic remains untouched aside from the improvements above.

How I Tested Ran the existing example script locally:

python main.py against tickers “SILK” and “PACE” from Jan 1, 2024 through today—confirmed:

No KeyError on date indexing

Correct DataFrame columns: Open, High, Low, Close, Change, Change (%), Volume

All columns typed as float64

Sliced, plotted, and exported the returned DataFrame to CSV to verify numeric correctness.

Impact & Compatibility Backward-compatible: still works if the PSX site reverts to using "TIME".

Forward-compatible: new columns like Change and Change (%) are now parsed without error.

Minimal footprint: patch affects only two methods (toframe() and preprocess()), so the risk of side-effects is very low.

Jun 27 '25 06:06 zohaibAlam840