pybaseball icon indicating copy to clipboard operation
pybaseball copied to clipboard

Pandas Error Importing Statcast Data

Open jmaschino56 opened this issue 2 years ago • 5 comments

pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 2

Stems from get_statcast_data_from_csv. Happens on 2022-07-17

jmaschino56 avatar Apr 17 '23 21:04 jmaschino56

This issue is not clear, could you give a more detailed description on how to get into it?

Thanks

BrayanMnz avatar Apr 24 '23 03:04 BrayanMnz

It looks like I am getting the same issue when using the statcast api like so:

Example:

df = pyb.statcast(
        start_dt="2016-10-01",
        end_dt="2016-10-31",
    )

Stacktrace:

File "/var/task/pybaseball/statcast.py", line 113, in statcast
  return _handle_request(start_dt_date, end_dt_date, 1, verbose=verbose,
File "/var/task/pybaseball/statcast.py", line 76, in _handle_request
  dataframe_list.append(future.result())
File "/var/lang/lib/python3.9/concurrent/futures/_base.py", line 439, in result
  return self.__get_result()
File "/var/lang/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
  raise self._exception
File "/var/lang/lib/python3.9/concurrent/futures/thread.py", line 58, in run
  result = self.fn(*self.args, **self.kwargs)
File "/var/task/pybaseball/cache/cache.py", line 58, in _cached
  result = func(*args, **kwargs)
File "/var/task/pybaseball/statcast.py", line 24, in _small_request
  data = statcast_ds.get_statcast_data_from_csv_url(
File "/var/task/pybaseball/cache/cache.py", line 58, in _cached
  result = func(*args, **kwargs)
File "/var/task/pybaseball/datasources/statcast.py", line 23, in get_statcast_data_from_csv_url
  return get_statcast_data_from_csv(
File "/var/task/pybaseball/datasources/statcast.py", line 35, in get_statcast_data_from_csv
  data = pd.read_csv(io.StringIO(csv_content))
File "/var/task/pandas/util/_decorators.py", line 211, in wrapper
  return func(*args, **kwargs)
File "/var/task/pandas/util/_decorators.py", line 331, in wrapper
  return func(*args, **kwargs)
File "/var/task/pandas/io/parsers/readers.py", line 950, in read_csv
  return _read(filepath_or_buffer, kwds)
File "/var/task/pandas/io/parsers/readers.py", line 611, in _read
  return parser.read(nrows)
File "/var/task/pandas/io/parsers/readers.py", line 1778, in read
  ) = self._engine.read(  # type: ignore[attr-defined]
File "/var/task/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
  chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
  chunk = self._read_rows(self.buffer_lines, 0)
File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
  self._tokenize_rows(irows - buffered_lines)
File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
  raise_parser_error('Error tokenizing data', self.parser)
File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
  raise ParserError(message)
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 2

EDIT: I think the issue is the statcast api returning an error when trying to access the CSV.

ghost avatar Apr 26 '23 03:04 ghost

I can't reproduce this issue, could someone who is hitting it run "pip list" in their env.

TravisGibbs avatar May 13 '23 00:05 TravisGibbs

Encountering same issue:

Package Version


absl-py 1.4.0 alabaster 0.7.13 albumentations 1.2.1 altair 4.2.2 anyio 3.6.2 appdirs 1.4.4 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 array-record 0.2.0 arviz 0.15.1 astropy 5.2.2 astunparse 1.6.3 attrs 23.1.0 audioread 3.0.0 autograd 1.5 Babel 2.12.1 backcall 0.2.0 beautifulsoup4 4.11.2 bleach 6.0.0 blis 0.7.9 blosc2 2.0.0 bokeh 2.4.3 branca 0.6.0 build 0.10.0 CacheControl 0.12.11 cached-property 1.5.2 cachetools 5.3.0 catalogue 2.0.8 certifi 2022.12.7 cffi 1.15.1 chardet 4.0.0 charset-normalizer 2.0.12 chex 0.1.7 click 8.1.3 cloudpickle 2.2.1 cmake 3.25.2 cmdstanpy 1.1.0 colorcet 3.0.1 colorlover 0.3.0 community 1.0.0b1 confection 0.0.4 cons 0.4.5 contextlib2 0.6.0.post1 contourpy 1.0.7 convertdate 2.4.0 cryptography 40.0.2 cufflinks 0.17.3 cupy-cuda11x 11.0.0 cvxopt 1.3.0 cvxpy 1.3.1 cycler 0.11.0 cymem 2.0.7 Cython 0.29.34 dask 2022.12.1 datascience 0.17.6 db-dtypes 1.1.1 dbus-python 1.2.16 debugpy 1.6.6 decorator 4.4.2 defusedxml 0.7.1 Deprecated 1.2.14 distributed 2022.12.1 dlib 19.24.1 dm-tree 0.1.8 docutils 0.16 dopamine-rl 4.0.6 duckdb 0.7.1 earthengine-api 0.1.350 easydict 1.10 ecos 2.0.12 editdistance 0.6.2 en-core-web-sm 3.5.0 entrypoints 0.4 ephem 4.1.4 et-xmlfile 1.1.0 etils 1.2.0 etuples 0.3.8 exceptiongroup 1.1.1 fastai 2.7.12 fastcore 1.5.29 fastdownload 0.0.7 fastjsonschema 2.16.3 fastprogress 1.0.3 fastrlock 0.8.1 filelock 3.12.0 firebase-admin 5.3.0 Flask 2.2.4 flatbuffers 23.3.3 flax 0.6.9 folium 0.14.0 fonttools 4.39.3 frozendict 2.3.7 fsspec 2023.4.0 future 0.18.3 gast 0.4.0 GDAL 3.3.2 gdown 4.6.6 gensim 4.3.1 geographiclib 2.0 geopy 2.3.0 gin-config 0.5.0 glob2 0.7 google 2.0.3 google-api-core 2.11.0 google-api-python-client 2.84.0 google-auth 2.17.3 google-auth-httplib2 0.1.0 google-auth-oauthlib 1.0.0 google-cloud-bigquery 3.9.0 google-cloud-bigquery-storage 2.19.1 google-cloud-core 2.3.2 google-cloud-datastore 2.15.1 google-cloud-firestore 2.11.0 google-cloud-language 2.9.1 google-cloud-storage 2.8.0 google-cloud-translate 3.11.1 google-colab 1.0.0 google-crc32c 1.5.0 google-pasta 0.2.0 google-resumable-media 2.5.0 googleapis-common-protos 1.59.0 googledrivedownloader 0.4 graphviz 0.20.1 greenlet 2.0.2 grpcio 1.54.0 grpcio-status 1.48.2 gspread 3.4.2 gspread-dataframe 3.0.8 gym 0.25.2 gym-notices 0.0.8 h5netcdf 1.1.0 h5py 3.8.0 holidays 0.25 holoviews 1.15.4 html5lib 1.1 httpimport 1.3.0 httplib2 0.21.0 huggingface-hub 0.15.1 humanize 4.6.0 hyperopt 0.2.7 idna 3.4 imageio 2.25.1 imageio-ffmpeg 0.4.8 imagesize 1.4.1 imbalanced-learn 0.10.1 imgaug 0.4.0 importlib-resources 5.12.0 imutils 0.5.4 inflect 6.0.4 iniconfig 2.0.0 intel-openmp 2023.1.0 ipykernel 5.5.6 ipython 7.34.0 ipython-genutils 0.2.0 ipython-sql 0.4.1 ipywidgets 7.7.1 itsdangerous 2.1.2 jax 0.4.10 jaxlib 0.4.10+cuda11.cudnn86 jieba 0.42.1 Jinja2 3.1.2 joblib 1.2.0 jsonpickle 3.0.1 jsonschema 4.3.3 jupyter-client 6.1.12 jupyter-console 6.1.0 jupyter_core 5.3.0 jupyter-server 1.24.0 jupyterlab-pygments 0.2.2 jupyterlab-widgets 3.0.7 kaggle 1.5.13 keras 2.12.0 kiwisolver 1.4.4 korean-lunar-calendar 0.3.1 langcodes 3.3.0 lazy_loader 0.2 libclang 16.0.0 librosa 0.10.0.post2 lightgbm 3.3.5 lit 16.0.5 llvmlite 0.39.1 locket 1.0.0 logical-unification 0.4.5 LunarCalendar 0.0.9 lxml 4.9.2 Markdown 3.4.3 markdown-it-py 2.2.0 MarkupSafe 2.1.2 matplotlib 3.7.1 matplotlib-inline 0.1.6 matplotlib-venn 0.11.9 mdurl 0.1.2 miniKanren 1.0.3 missingno 0.5.2 mistune 0.8.4 mizani 0.8.1 mkl 2019.0 ml-dtypes 0.1.0 mlxtend 0.14.0 more-itertools 9.1.0 moviepy 1.0.3 mpmath 1.3.0 msgpack 1.0.5 multipledispatch 0.6.0 multitasking 0.0.11 murmurhash 1.0.9 music21 8.1.0 natsort 8.3.1 nbclient 0.7.4 nbconvert 6.5.4 nbformat 5.8.0 nest-asyncio 1.5.6 networkx 3.1 nibabel 3.0.2 nltk 3.8.1 notebook 6.4.8 numba 0.56.4 numexpr 2.8.4 numpy 1.22.4 oauth2client 4.1.3 oauthlib 3.2.2 opencv-contrib-python 4.7.0.72 opencv-python 4.7.0.72 opencv-python-headless 4.7.0.72 openpyxl 3.0.10 opt-einsum 3.3.0 optax 0.1.5 orbax-checkpoint 0.2.1 osqp 0.6.2.post8 packaging 23.1 palettable 3.3.3 pandas 1.5.3 pandas-datareader 0.10.0 pandas-gbq 0.17.9 pandocfilters 1.5.0 panel 0.14.4 param 1.13.0 parso 0.8.3 partd 1.4.0 pathlib 1.0.1 pathy 0.10.1 patsy 0.5.3 pexpect 4.8.0 pickleshare 0.7.5 Pillow 8.4.0 pip 23.1.2 pip-tools 6.13.0 platformdirs 3.3.0 plotly 5.13.1 plotnine 0.10.1 pluggy 1.0.0 polars 0.17.3 pooch 1.6.0 portpicker 1.3.9 prefetch-generator 1.0.3 preshed 3.0.8 prettytable 0.7.2 proglog 0.1.10 progressbar2 4.2.0 prometheus-client 0.16.0 promise 2.3 prompt-toolkit 3.0.38 prophet 1.1.3 proto-plus 1.22.2 protobuf 3.20.3 psutil 5.9.5 psycopg2 2.9.6 ptyprocess 0.7.0 py-cpuinfo 9.0.0 py4j 0.10.9.7 pyarrow 9.0.0 pyasn1 0.5.0 pyasn1-modules 0.3.0 pybaseball 2.2.5 pycocotools 2.0.6 pycparser 2.21 pyct 0.5.0 pydantic 1.10.7 pydata-google-auth 1.7.0 pydot 1.4.2 pydot-ng 2.0.0 pydotplus 2.0.2 PyDrive 1.3.1 pyerfa 2.0.0.3 pygame 2.3.0 PyGithub 1.58.2 Pygments 2.14.0 PyGObject 3.36.0 PyJWT 2.7.0 pymc 5.1.2 PyMeeus 0.5.12 pymystem3 0.2.0 PyNaCl 1.5.0 PyOpenGL 3.1.6 pyparsing 3.0.9 pyproject_hooks 1.0.0 pyrsistent 0.19.3 PySocks 1.7.1 pytensor 2.10.1 pytest 7.2.2 python-apt 0.0.0 python-dateutil 2.8.2 python-louvain 0.16 python-slugify 8.0.1 python-utils 3.5.2 pytz 2022.7.1 pytz-deprecation-shim 0.1.0.post0 pyviz-comms 2.2.1 PyWavelets 1.4.1 PyYAML 6.0 pyzmq 23.2.1 qdldl 0.1.7 qudida 0.0.4 regex 2022.10.31 requests 2.27.1 requests-oauthlib 1.3.1 requests-unixsocket 0.2.0 requirements-parser 0.5.0 rich 13.3.4 rpy2 3.5.5 rsa 4.9 safetensors 0.3.1 scikit-image 0.19.3 scikit-learn 1.2.2 scipy 1.10.1 scs 3.2.3 seaborn 0.12.2 Send2Trash 1.8.0 setuptools 67.7.2 shapely 2.0.1 six 1.16.0 sklearn-pandas 2.2.0 smart-open 6.3.0 sniffio 1.3.0 snowballstemmer 2.2.0 sortedcontainers 2.4.0 soundfile 0.12.1 soupsieve 2.4.1 soxr 0.3.5 spacy 3.5.2 spacy-legacy 3.0.12 spacy-loggers 1.0.4 Sphinx 3.5.4 sphinxcontrib-applehelp 1.0.4 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 2.0.1 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.5 SQLAlchemy 2.0.10 sqlparse 0.4.4 srsly 2.4.6 statsmodels 0.13.5 sympy 1.11.1 tables 3.8.0 tabulate 0.8.10 tblib 1.7.0 tenacity 8.2.2 tensorboard 2.12.2 tensorboard-data-server 0.7.0 tensorboard-plugin-wit 1.8.1 tensorflow 2.12.0 tensorflow-datasets 4.9.2 tensorflow-estimator 2.12.0 tensorflow-gcs-config 2.12.0 tensorflow-hub 0.13.0 tensorflow-io-gcs-filesystem 0.32.0 tensorflow-metadata 1.13.1 tensorflow-probability 0.20.1 tensorstore 0.1.36 termcolor 2.3.0 terminado 0.17.1 text-unidecode 1.3 textblob 0.17.1 tf-slim 1.1.0 thinc 8.1.9 threadpoolctl 3.1.0 tifffile 2023.4.12 tinycss2 1.2.1 tokenizers 0.13.3 toml 0.10.2 tomli 2.0.1 toolz 0.12.0 torch 2.0.1+cu118 torchaudio 2.0.2+cu118 torchdata 0.6.1 torchsummary 1.5.1 torchtext 0.15.2 torchvision 0.15.2+cu118 tornado 6.3.1 tqdm 4.65.0 traitlets 5.7.1 transformers 4.30.2 triton 2.0.0 tweepy 4.13.0 typer 0.7.0 types-setuptools 67.8.0.0 typing_extensions 4.5.0 tzdata 2023.3 tzlocal 4.3 uritemplate 4.1.1 urllib3 1.26.15 vega-datasets 0.9.0 wasabi 1.1.1 wcwidth 0.2.6 webcolors 1.13 webencodings 0.5.1 websocket-client 1.5.1 Werkzeug 2.3.0 wheel 0.40.0 widgetsnbextension 3.6.4 wordcloud 1.8.2.2 wrapt 1.14.1 xarray 2022.12.0 xarray-einstats 0.5.1 xgboost 1.7.5 xlrd 2.0.1 yellowbrick 1.5 yfinance 0.2.18 zict 3.0.0 zipp 3.15.0

Mark-DelGrande avatar Jun 15 '23 02:06 Mark-DelGrande