get_all_tickers icon indicating copy to clipboard operation
get_all_tickers copied to clipboard

ParserError: Error tokenizing data. C error: Expected 1 fields in line 23, saw 46

Open MislavSag opened this issue 3 years ago • 22 comments

When I try:

from get_all_tickers import get_tickers as gt

tickers = gt.get_tickers()

I get an error:


tickers = gt.get_tickers(NASDAQ=False)
---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
c:\Users\Mislav\Documents\GitHub\stocksee\stocksee\ib_market_data.py in 
----> 36 tickers = gt.get_tickers(NASDAQ=False)

C:\ProgramData\Anaconda3\lib\site-packages\get_all_tickers\get_tickers.py in get_tickers(NYSE, NASDAQ, AMEX)
     71     tickers_list = []
     72     if NYSE:
---> 73         tickers_list.extend(__exchange2list('nyse'))
     74     if NASDAQ:
     75         tickers_list.extend(__exchange2list('nasdaq'))

C:\ProgramData\Anaconda3\lib\site-packages\get_all_tickers\get_tickers.py in __exchange2list(exchange)
    136 
    137 def __exchange2list(exchange):
--> 138     df = __exchange2df(exchange)
    139     # removes weird tickers
    140     df_filtered = df[~df['Symbol'].str.contains("\.|\^")]

C:\ProgramData\Anaconda3\lib\site-packages\get_all_tickers\get_tickers.py in __exchange2df(exchange)
    132     response = requests.get('https://old.nasdaq.com/screening/companies-by-name.aspx', headers=headers, params=params(exchange))
    133     data = io.StringIO(response.text)
--> 134     df = pd.read_csv(data, sep=",")
    135     return df
    136 

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    674         )
    675 
--> 676         return _read(filepath_or_buffer, kwds)
    677 
    678     parser_f.__name__ = name

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    452 
    453     try:
--> 454         data = parser.read(nrows)
    455     finally:
    456         parser.close()

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in read(self, nrows)
   1131     def read(self, nrows=None):
   1132         nrows = _validate_integer("nrows", nrows)
-> 1133         ret = self._engine.read(nrows)
   1134 
   1135         # May alter columns / col_dict

~\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py in read(self, nrows)
   2035     def read(self, nrows=None):
   2036         try:
-> 2037             data = self._reader.read(nrows)
   2038         except StopIteration:
   2039             if self._first_chunk:

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 23, saw 46

MislavSag avatar Jan 25 '21 08:01 MislavSag

I am getting the same error. Lets hope it gets fixed. @shilewenuw

sarino5 avatar Jan 25 '21 17:01 sarino5

I am seeing this same issue. I really hope it gets fixed soon!

divvision33 avatar Jan 25 '21 17:01 divvision33

Nasdaq API got updated, so the old URL is no longer available I believe.

The following is a quick implementation of the new API.

import requests
import pandas as pd

headers = {
    'authority': 'api.nasdaq.com',
    'accept': 'application/json, text/plain, */*',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'origin': 'https://www.nasdaq.com',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.nasdaq.com/',
    'accept-language': 'en-US,en;q=0.9',
}

params = (
    ('tableonly', 'true'),
    ('limit', '25'),
    ('offset', '0'),
    ('download', 'true'),
)

r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)
data = r.json()['data']
df = pd.DataFrame(data['rows'], columns=data['headers'])

Possums avatar Jan 25 '21 18:01 Possums

Hi, Anyone has a workaround for this please?

jokkemoose avatar Jan 26 '21 06:01 jokkemoose

I made a quick&dirty fix in get_tickers.py. Filtering by mktcap_min, mktcap_max and sectors works for me. I didn't test regions. Github doesn't allow me to upload a .py file, so you need to remove the '.txt' ending of this one and replace the corresponding file in the package. get_tickers.py.txt Thanks to @Possums for the basics!

ErlerPhilipp avatar Jan 26 '21 12:01 ErlerPhilipp

I have the same issue.

krikru avatar Jan 26 '21 19:01 krikru

@ErlerPhilipp Great! The standard way to suggest code changes is to create a pull request, so that might be why you can't upload a .py file (it's slightly more work, though).

krikru avatar Jan 26 '21 19:01 krikru

@krikru I know but I decided against it, for now, because it's mostly untested changes and really dirty ;) If I have time, I'll create a PR.

ErlerPhilipp avatar Jan 26 '21 19:01 ErlerPhilipp

Hi, any updates? I am still getting this error

patroucheva avatar Jan 26 '21 20:01 patroucheva

@ErlerPhilipp getting the following error with the new get_tickers.py:

File "/anaconda3/lib/python3.7/site-packages/get_all_tickers/get_tickers.py", line 92, in get_tickers tickers_list.extend(__exchange2list('nyse')) File "/anaconda3/lib/python3.7/site-packages/get_all_tickers/get_tickers.py", line 162, in __exchange2list df_filtered = df[~df['Symbol'].str.contains(".|^")] File "/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2688, in getitem return self._getitem_column(key) File "/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column return self._get_item_cache(key) File "/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache values = self._data.get(item) File "/anaconda3/lib/python3.7/site-packages/pandas/core/internals.py", line 4115, in get loc = self.items.get_loc(item) File "/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Symbol'

patroucheva avatar Jan 26 '21 22:01 patroucheva

@patroucheva, You can fix the issue by changing 'Symbol' to 'symbol' on lines 162 and 163

Coombszy avatar Jan 26 '21 22:01 Coombszy

Fix get_tickers.py.txt and full output tickers.xlsx

DimaDDM avatar Jan 26 '21 23:01 DimaDDM

I made a quick&dirty fix in get_tickers.py. Filtering by mktcap_min, mktcap_max and sectors works for me. I didn't test regions. Github doesn't allow me to upload a .py file, so you need to remove the '.txt' ending of this one and replace the corresponding file in the package. get_tickers.py.txt Thanks to @Possums for the basics!

After making the change, what were the next steps for getting the package to update? I've tried making the swap and reinstalling with the new file, but when trying to use the package in script it still renders the same error.

monkey-bidness avatar Feb 05 '21 13:02 monkey-bidness

Nasdaq API got updated, so the old URL is no longer available I believe.

The following is a quick implementation of the new API.

import requests
import pandas as pd

headers = {
    'authority': 'api.nasdaq.com',
    'accept': 'application/json, text/plain, */*',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'origin': 'https://www.nasdaq.com',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.nasdaq.com/',
    'accept-language': 'en-US,en;q=0.9',
}

params = (
    ('tableonly', 'true'),
    ('limit', '25'),
    ('offset', '0'),
    ('download', 'true'),
)

r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)`
data = r.json()['data']
df = pd.DataFrame(data['rows'], columns=data['headers'])

There is a small typo on the following line:

r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)'

Simply remove the extra quotation mark at the end. This works @Possums

frank-ceballos avatar Feb 13 '21 14:02 frank-ceballos

Updated the code by DimaDDM to be able to get individual exchanges rather than return all symbols get_tickers_by_region also updated but untested

get_tickers.py.txt

s-kerin avatar Feb 14 '21 12:02 s-kerin

@DimaDDM I just copied and pasted that file into get_tickers.py

Could you open a PR for it?

tpt5cu avatar Mar 12 '21 04:03 tpt5cu

Has anyone managed to get_tickers_by_region working? I can only seem to get it to return US tickers instead of the region specified.

bobdabuilder1 avatar Apr 05 '21 09:04 bobdabuilder1

hey @here, as it doesn't look like the author cares much about this. I'd suggest that somebody that does, and yet doesn't want to just use the script provided above, create their own repo/fork this with the appropriate modifications.

There's no point in creating PRs if the author is MIA.

arainboldt avatar Apr 12 '21 08:04 arainboldt

I made a quick&dirty fix in get_tickers.py. Filtering by mktcap_min, mktcap_max and sectors works for me. I didn't test regions. Github doesn't allow me to upload a .py file, so you need to remove the '.txt' ending of this one and replace the corresponding file in the package. get_tickers.py.txt Thanks to @Possums for the basics!

Hey, the "Symbol" column name shall be in all lower case "symbol"

JaisinhBhosale9712 avatar May 01 '21 17:05 JaisinhBhosale9712

Copy and paste this code to get_tickers.py in your library by searching in your PC where it is. Cheers.

import pandas as pd
from enum import Enum
import io
import requests

_EXCHANGE_LIST = ['nyse', 'nasdaq', 'amex']

_SECTORS_LIST = set(['Consumer Non-Durables', 'Capital Goods', 'Health Care',
       'Energy', 'Technology', 'Basic Industries', 'Finance',
       'Consumer Services', 'Public Utilities', 'Miscellaneous',
       'Consumer Durables', 'Transportation'])


headers = {
    'authority': 'api.nasdaq.com',
    'accept': 'application/json, text/plain, */*',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'origin': 'https://www.nasdaq.com',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.nasdaq.com/',
    'accept-language': 'en-US,en;q=0.9',
}

def params(exchange):
    return (
        ('letter', '0'),
        ('exchange', exchange),
        ('render', 'download'),
    )

params = (
    ('tableonly', 'true'),
    ('limit', '25'),
    ('offset', '0'),
    ('download', 'true'),
)

def params_region(region):
    return (
        ('letter', '0'),
        ('region', region),
        ('render', 'download'),
    )


class Region(Enum):
    AFRICA = 'AFRICA'
    EUROPE = 'EUROPE'
    ASIA = 'ASIA'
    AUSTRALIA_SOUTH_PACIFIC = 'AUSTRALIA+AND+SOUTH+PACIFIC'
    CARIBBEAN = 'CARIBBEAN'
    SOUTH_AMERICA = 'SOUTH+AMERICA'
    MIDDLE_EAST = 'MIDDLE+EAST'
    NORTH_AMERICA = 'NORTH+AMERICA'

class SectorConstants:
    NON_DURABLE_GOODS = 'Consumer Non-Durables'
    CAPITAL_GOODS = 'Capital Goods'
    HEALTH_CARE = 'Health Care'
    ENERGY = 'Energy'
    TECH = 'Technology'
    BASICS = 'Basic Industries'
    FINANCE = 'Finance'
    SERVICES = 'Consumer Services'
    UTILITIES = 'Public Utilities'
    DURABLE_GOODS = 'Consumer Durables'
    TRANSPORT = 'Transportation'


def get_tickers(NYSE=True, NASDAQ=True, AMEX=True):
    tickers_list = []
    if NYSE:
        tickers_list.extend(__exchange2list('nyse'))
    if NASDAQ:
        tickers_list.extend(__exchange2list('nasdaq'))
    if AMEX:
        tickers_list.extend(__exchange2list('amex'))
    return tickers_list


def get_tickers_filtered(mktcap_min=None, mktcap_max=None, sectors=None):
    tickers_list = []
    for exchange in _EXCHANGE_LIST:
        tickers_list.extend(__exchange2list_filtered(exchange, mktcap_min=mktcap_min, mktcap_max=mktcap_max, sectors=sectors))
    return tickers_list


def get_biggest_n_tickers(top_n, sectors=None):
    df = pd.DataFrame()
    for exchange in _EXCHANGE_LIST:
        temp = __exchange2df(exchange)
        df = pd.concat([df, temp])
        
    df = df.dropna(subset={'marketCap'})
    df = df[~df['Symbol'].str.contains("\.|\^")]

    if sectors is not None:
        if isinstance(sectors, str):
            sectors = [sectors]
        if not _SECTORS_LIST.issuperset(set(sectors)):
            raise ValueError('Some sectors included are invalid')
        sector_filter = df['Sector'].apply(lambda x: x in sectors)
        df = df[sector_filter]

    def cust_filter(mkt_cap):
        if 'M' in mkt_cap:
            return float(mkt_cap[1:-1])
        elif 'B' in mkt_cap:
            return float(mkt_cap[1:-1]) * 1000
        else:
            return float(mkt_cap[1:]) / 1e6
    df['marketCap'] = df['marketCap'].apply(cust_filter)

    df = df.sort_values('marketCap', ascending=False)
    if top_n > len(df):
        raise ValueError('Not enough companies, please specify a smaller top_n')

    return df.iloc[:top_n]['Symbol'].tolist()


def get_tickers_by_region(region):
    if region in Region:
        response = requests.get('https://old.nasdaq.com/screening/companies-by-name.aspx', headers=headers,
                                params=params_region(region))
        data = io.StringIO(response.text)
        df = pd.read_csv(data, sep=",")
        return __exchange2list(df)
    else:
        raise ValueError('Please enter a valid region (use a Region.REGION as the argument, e.g. Region.AFRICA)')

def __exchange2df(exchange):
    r = requests.get('https://api.nasdaq.com/api/screener/stocks', headers=headers, params=params)
    data = r.json()['data']
    df = pd.DataFrame(data['rows'], columns=data['headers'])
    return df

def __exchange2list(exchange):
    df = __exchange2df(exchange)
    df_filtered = df[~df['symbol'].str.contains("\.|\^")]
    return df_filtered['symbol'].tolist()

def __exchange2list_filtered(exchange, mktcap_min=None, mktcap_max=None, sectors=None):
    df = __exchange2df(exchange)
    df = df.dropna(subset={'marketCap'})
    df = df[~df['symbol'].str.contains("\.|\^")]

    if sectors is not None:
        if isinstance(sectors, str):
            sectors = [sectors]
        if not _SECTORS_LIST.issuperset(set(sectors)):
            raise ValueError('Some sectors included are invalid')
        sector_filter = df['sector'].apply(lambda x: x in sectors)
        df = df[sector_filter]

    def cust_filter(mkt_cap):
        if 'M' in mkt_cap:
            return float(mkt_cap[1:-1])
        elif 'B' in mkt_cap:
            return float(mkt_cap[1:-1]) * 1000
        elif mkt_cap == '':
            return 0.0
        else:
            return float(mkt_cap[1:]) / 1e6
    df['marketCap'] = df['marketCap'].apply(cust_filter)
    if mktcap_min is not None:
        df = df[df['marketCap'] > mktcap_min]
    if mktcap_max is not None:
        df = df[df['marketCap'] < mktcap_max]
    return df['symbol'].tolist()

def save_tickers(NYSE=True, NASDAQ=True, AMEX=True, filename='tickers.csv'):
    tickers2save = get_tickers(NYSE, NASDAQ, AMEX)
    df = pd.DataFrame(tickers2save)
    df.to_csv(filename, header=False, index=False)

def save_tickers_by_region(region, filename='tickers_by_region.csv'):
    tickers2save = get_tickers_by_region(region)
    df = pd.DataFrame(tickers2save)
    df.to_csv(filename, header=False, index=False)


if __name__ == '__main__':

    tickers = get_tickers()
    print(tickers[:5])

    tickers = get_tickers(AMEX=False)

    # default filename is tickers.csv, to specify, add argument filename='yourfilename.csv'
    save_tickers()

    # save tickers from NYSE and AMEX only
    save_tickers(NASDAQ=False)

    # get tickers from Asia
    tickers_asia = get_tickers_by_region(Region.ASIA)
    print(tickers_asia[:5])

    # save tickers from Europe
    save_tickers_by_region(Region.EUROPE, filename='EU_tickers.csv')

    # get tickers filtered by market cap (in millions)
    filtered_tickers = get_tickers_filtered(mktcap_min=500, mktcap_max=2000)
    print(filtered_tickers[:5])

    # not setting max will get stocks with $2000 million market cap and up.
    filtered_tickers = get_tickers_filtered(mktcap_min=2000)
    print(filtered_tickers[:5])

    # get tickers filtered by sector
    filtered_by_sector = get_tickers_filtered(mktcap_min=200e3, sectors=SectorConstants.FINANCE)
    print(filtered_by_sector[:5])

    # get tickers of 5 largest companies by market cap (specify sectors=SECTOR)
    top_5 = get_biggest_n_tickers(5)
    print(top_5)

justege avatar May 30 '21 14:05 justege

Same issue gt.get_tickers_filtered(mktcap_max=1)

Exception has occurred: ParserError
Error tokenizing data. C error: Expected 1 fields in line 6, saw 47
  File "C:\dev\Python\stocks\get_symbols.py", line 9, in <module>
    list_of_tickers = gt.get_tickers_filtered(mktcap_max=1)

Karlheinzniebuhr avatar Sep 04 '22 05:09 Karlheinzniebuhr

@justege: looks like lines 97 and 120 should be lower case 'symbol' vs 'Symbol', as found by @JaisinhBhosale9712 in the comment just before yours.

paulmcq avatar Sep 22 '23 16:09 paulmcq