pandas
pandas copied to clipboard
PERF: Avoid fragmentation of DataFrame in read_sas
- [x] closes #48595 (Replace xxxx with the Github issue number)
- [ ] Tests added and passed if fixing a bug or adding a new feature
- [x] All code checks passed.
- [x] Added type annotations to new arguments/methods/functions.
- [x] Added an entry in the latest
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.
Not sure how to test this, the op file is 13mb which is way to large for our repository.
I know we want to limit network calls, for testing, how about reading OP's URL that points to a CDC .xpt file. Currently, read_sas can parse online files:
pd.read_sas("https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DR1TOT_J.XPT")
There is also a pytest decorator check in case link changes to use before test. Given the date and government site, link looks reliable over the years:
@pytest.mark.network
@pytest.mark.slow
@tm.network(
url="https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DR1TOT_J.XPT",
check_before_test=True,
)
This runs forever (25secs), since we don't have slow builds anymore, I don't think this is worth it
Thanks @phofl