pandas icon indicating copy to clipboard operation
pandas copied to clipboard

PERF: Avoid fragmentation of DataFrame in read_sas

Open phofl opened this issue 3 years ago • 2 comments

  • [x] closes #48595 (Replace xxxx with the Github issue number)
  • [ ] Tests added and passed if fixing a bug or adding a new feature
  • [x] All code checks passed.
  • [x] Added type annotations to new arguments/methods/functions.
  • [x] Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Not sure how to test this, the op file is 13mb which is way to large for our repository.

phofl avatar Sep 17 '22 12:09 phofl

I know we want to limit network calls, for testing, how about reading OP's URL that points to a CDC .xpt file. Currently, read_sas can parse online files:

pd.read_sas("https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DR1TOT_J.XPT")

There is also a pytest decorator check in case link changes to use before test. Given the date and government site, link looks reliable over the years:

@pytest.mark.network
@pytest.mark.slow
@tm.network(
    url="https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/DR1TOT_J.XPT",
    check_before_test=True,
)

ParfaitG avatar Sep 18 '22 18:09 ParfaitG

This runs forever (25secs), since we don't have slow builds anymore, I don't think this is worth it

phofl avatar Sep 18 '22 18:09 phofl

Thanks @phofl

mroeschke avatar Sep 21 '22 18:09 mroeschke