dask-ml
dask-ml copied to clipboard
Specify data type in CountVectorzier
What happened:
I have the data in the following format:
0 Satellite TV|Golf Course|Airport Shuttle|Cosme...
1 Satellite TV|Cosmetic Mirror|Safe (Hotel)|Tele...
2 Satellite TV|Cosmetic Mirror|Safe (Hotel)|Tele...
3 Satellite TV|Sailing|Cosmetic Mirror|Telephone...
4 Satellite TV|Sailing|Diving|Cosmetic Mirror|Sa...
vect = CountVectorizer(tokenizer=lambda x: x.split("|"))
tf_df = vect.fit_transform(item_data['properties'])
What you expected to happen:
I want to use CountVectorizer
to get the dataframe with the corresponding columns
Seems I need to specify the output type somehow. But I don't see CountVectorizer having an interface to specify metadata
ValueError: Metadata inference failed in `_count_vectorizer_transform`.
You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Minimal Complete Verifiable Example:
# Put your MCVE code here
Anything else we need to know?:
Environment:
- Dask version:
dask 2021.1.1 pyhd3eb1b0_0
dask-core 2021.1.1 pyhd3eb1b0_0
dask-glm 0.2.0 py38_0
dask-ml 1.8.0 pyhd3eb1b0_0
- Python version: Python 3.8.13
- Operating System: Ubuntu 20.04
- Install method (conda, pip, source): conda
Cluster Dump State:
sir/mam, may i work with that issue ,can you assign to me