pylint
pylint copied to clipboard
False positives on ``pandas.io.parsers.TextFileReader``
Steps to reproduce
# pylint: disable=missing-module-docstring
# pylint: enable=unsubscriptable-object,unsupported-assignment-operation,no-member
import pandas as pd
data_frame = pd.read_csv("foo.csv")
print(data_frame.shape)
for column in data_frame.columns:
data_frame[column] = data_frame[column].astype("S")
Current behavior
repro.py:7:14: E1101: Instance of 'TextFileReader' has no 'columns' member (no-member)
repro.py:8:4: E1137: 'data_frame' does not support item assignment (unsupported-assignment-operation)
repro.py:8:25: E1136: Value 'data_frame' is unsubscriptable (unsubscriptable-object)
Strangely, the no-member error goes away if you leave out the print statement.
Expected behavior
No errors, which was the case with pylint 2.7.x.
pylint --version output
Result of pylint --version output:
pylint 2.8.3
astroid 2.5.8
Python 3.7.10 (default, Jun 4 2021, 14:48:32)
Additional dependencies:
pandas==1.2.4
Actually looking more closely at the pandas code, I think the root cause may be that it's inferring the return type as TextFileReader where actually in this case it should be a DataFrame or Series, since we're not setting iterator or chunksize.
Thank you for creating the issue, I can reproduce this. Regarding the error that goes away with the print statement, could it be that there is two no-member error, one on 'TextFileReader' has no 'shape' member (no-member) on the print the other Instance of 'TextFileReader' has no 'columns' member (no-member) on the for loop ?
I never see an error for shape. Possibly TextFileReader does have a shape? I haven't looked too deeply into the pandas source code. But that wouldn't explain why columns only complains after that print.
I was thinking maybe you did not notice the warning on the print line. I have 4 warnings with your example one of them on the print line for "shape". Do you mean you have 3 warnings with the example you gave and the "columns" warning disappear if you remove the print ?
Yes.
OK, I cannot reproduce that with pandas 1.2.4 but the main problem is the false positive for no-member anyway.
In our codebase the main problem is actually the false positive on unsubscriptable-object. The no-member error can be easily worked around by setting generated-members. However as I said I believe the root cause is the same for both - the inferred type TextFileReader is not correct.
We see the same thing. Strangely enough the following snippet
#pylint: disable=missing-module-docstring,pointless-statement
import pandas as pd
df = pd.read_csv("some.csv")
df.columns
df.columns
gives Instance of 'TextFileReader' has no 'columns' member.
However if either
- the last line is commented out
- or alternatively these two lines in
pandasduring thedfinitialization are commented out (which are not executed anyway due tosqueeze=Falseby default)
it goes away. :thinking:
Investigated a bit further. The false positive appears to have been introduced between astroid==2.5.7and astroid==2.5.8, in https://github.com/PyCQA/astroid/pull/1009. For the specific snippet above it looks like increasing max_inferred to >=166 removes the false positive in this case.
I am seeing a similar issue with the code:
import io
import pandas
df = pandas.read_csv(io.StringIO("well\nx"))
df.loc[:, "well"] = df.well.str.replace("x", "y")
Instance of 'TextFileReader' has no 'well' member
I believe it is the same root cause, as the errors disappears after one of numerous manipulations. Maybe it is helpful as a test case once this bug is fixed.
I too have this issue. Interestingly commenting out the df.dropna(...) line removes the problem.
Edit: I have checked and this is also due to thinking it is a TextFileReader
# test.py
import pandas as pd
df = pd.read_csv("input_filename")
df.dropna(subset=["title"], inplace=True)
have_NaN = df[["severity", "priority", "notice"]].isna().any(axis=1)
% # with dropna
% pylint test.py --disable=missing-module-docstring
************* Module test
test.py:5:11: E1136: Value 'df' is unsubscriptable (unsubscriptable-object)
--------------------------------------------------------------------
Your code has been rated at -2.50/10 (previous run: -2.50/10, +0.00)
% # after remove dropna
% pylint test.py --disable=missing-module-docstring
---------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: -2.50/10, +12.50)
% pylint --version
pylint 2.11.1
astroid 2.8.0
Python 3.7.5 (default, Aug 13 2020, 09:55:33)
[Clang 11.0.3 (clang-1103.0.32.62)]
% python -c 'import pandas; print(pandas.__version__)'
1.3.3
Huh, I wasn't imaging :joy: this really was a change in behavior.
When you remove inplace=True and instead assign the changed data frame from dropna to df again everything is fine.
I am using
pylint 2.9.6
astroid 2.6.6
pandas 1.3.2
EDIT: I just noticed that the error message in https://github.com/PyCQA/pylint/issues/4577#issuecomment-930846829 above is also referencing the variable name, so some of what's below is redundant.
"inplace=True" seems problematic in multiple cases. The following produces a similar issue, but interestingly enough it does NOT reference "TextFileReader" as the unsupported type but rather the variable name itself. Possibly a related but independent issue?
import pandas as pd
data_frame: pd.DataFrame = pd.read_csv("foo.csv")
data_frame.fillna("", inplace=True)
data_frame["bar"] = data_frame[["baz", "bat"]].apply(
lambda row: f'{str(row["baz"])}-{str(row["bat"])}',
axis=1
)
The linting the above results in the following (on my system at least ;)):
5,0,error,unsupported-assignment-operation:'data_frame' does not support item assignment
5,27,error,unsubscriptable-object:Value 'data_frame' is unsubscriptable
Replacing the "inplace" operation with data_frame = data_frame.fill_na("") eliminates the error.
Version info:
pandas 1.3.4
astroid 2.9.0
pylint 2.12.2
Based on the findings in https://github.com/PyCQA/pylint/issues/4577#issuecomment-871694490 we use the workaround below which might be useful for others here facing this issue.
Basically we for now increase astroid.context.InferenceContext.max_inferred to a higher value than the hard coded 100 using e.g.
[MASTER]
# As a temporary workaround for https://github.com/PyCQA/pylint/issues/4577
init-hook = "import astroid; astroid.context.InferenceContext.max_inferred = 500"
in .pylintrc, or alternatively as a direct command line argument
pylint some_file_to_lint.py --init-hook "import astroid; astroid.context.InferenceContext.max_inferred = 500"
Thank you for the example code @anders-kiaer
With max_inferred = 100
The return types are:
[<Instance of pandas.io.parsers.readers.TextFileReader>, Uninferable]
Only one positive return type
With max_inferred = 500
The return types are:
[<Instance of pandas.io.parsers.readers.TextFileReader>, Uninferable, <Instance of pandas.core.frame.DataFrame>]
Two positive return types
What do you think @Pierre-Sassoulas @PCManticore should unsubscriptable-object be raised if a singular type is returned with an Uninferable return type? I can understand that pandas is sacrificing consistent return types for usability with this function however it looks like unsubscriptable-object doesn't raise if there are multiple positive return types like with max_inferred=500
I think a brain could be created for read_csv if we don't want to change unsubscriptable-object.
Raising max_inferred really slows down this code -from 1s to 10s on my computer so I don't think it may be a good idea to change it @anders-kiaer
code for printing out types:
MAX_INFERRED = 500
import astroid
astroid.context.InferenceContext.max_inferred = MAX_INFERRED
ret = astroid.extract_node("""
import pandas as pd
df = pd.read_csv("some.csv")
df #@
df.columns
df.columns
""")
inferred = ret.inferred()
print(inferred)
The same problem appears with fillna:
import pandas
dataframe = pandas.DataFrame({'A': [1,2,None]})
dataframe = dataframe.fillna(0)
print(dataframe['A'])
pylint will complain about E1136: Value 'dataframe' is unsubscriptable (unsubscriptable-object)