influxdb-python icon indicating copy to clipboard operation
influxdb-python copied to clipboard

InfluxDB-python version 5.3.0 chunk=True

Open xiandong79 opened this issue 4 years ago • 23 comments

  • InfluxDB-python version: 5.3.0
  • Python version: 3.7.4
  • Operating system version: macOS 10.14.5

msgpack.exceptions.ExtraData: unpack(b) received extra data

Traceback (most recent call last):
  File "/Users/dong/Desktop/mosaic-research/analysis/analysis.py", line 17, in <module>
    public_book = mosaic_client.public_book(exchange=exchange, instrument=instrument, ts_start=ts, ts_end=ts+save_interval, depth=1)
  File "/Users/dong/Desktop/mosaic-research/py_mosaic_client/py_mosaic_client/mosaic_client.py", line 74, in public_book
    result = self.client.query(f'SELECT * FROM "l2_book-{exchange}" WHERE time > {ts_start} AND time <= {ts_end}', chunked=True, chunk_size=10000)
  File "/Users/dong/opt/anaconda3/lib/python3.7/site-packages/influxdb/client.py", line 518, in query
    expected_response_code=expected_response_code
  File "/Users/dong/opt/anaconda3/lib/python3.7/site-packages/influxdb/client.py", line 352, in request
    raw=False)
  File "msgpack/_unpacker.pyx", line 209, in msgpack._cmsgpack.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.

xiandong79 avatar Apr 13 '20 14:04 xiandong79

https://github.com/influxdata/influxdb-python/commit/c903d73efcf49b4e340490072d777d8f34ac8e1c

I think it may be related to this PR

xiandong79 avatar Apr 13 '20 14:04 xiandong79

Thanks for reporting this @xiandong79, I'll investigate ASAP. I should have added a test to the dataframe_client for this.

sebito91 avatar Apr 14 '20 00:04 sebito91

I can take a look too if that helps, I haven't come across that issue though.

hrbonz avatar Apr 14 '20 02:04 hrbonz

the version 5.2.3. works well

xiandong79 avatar Apr 14 '20 02:04 xiandong79

I'm having the same issue querying from both Influx 1.7.10 and 1.7.7 Interestingly with Influx 1.0.2 the bug is not present.

hiksuman avatar Apr 14 '20 13:04 hiksuman

There are a lot of differences between 5.2.3 and 5.3.0, which is why we stepped a minor release instead of point-release.

@hrbonz if you want to take a look that would be AWESOME!

sebito91 avatar Apr 14 '20 16:04 sebito91

I am getting a different error, but seemingly from a similar place. InfluxDB-python version: 5.3.0 Python version: 3.7.4 Operating system version: Ubuntu 16.04


influxdb/client.py in request(self, url, method, params, data, stream, expected_response_code, headers)
    350                 packed=response.content,
    351                 ext_hook=_msgpack_parse_hook,
--> 352                 raw=False)
    353         else:
    354             response._msgpack = None

msgpack/_unpacker.pyx in msgpack._cmsgpack.unpackb()
`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 8: invalid continuation byte`

laurikoobas avatar Apr 16 '20 13:04 laurikoobas

Similar with SHOW DIAGNOSTICS query

  • InfluxDB 1.7.6
  • InfluxDB-python version: 5.3.0
  • Python version: 3.6.9
  • Operating system version: Ubuntu 18.04
python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from influxdb import client
>>> influxdb_client = client.InfluxDBClient("192.168.10.6", "8086")
>>> influxdb_client.ping()
'1.7.6'
>>> influxdb_client.query('SHOW DIAGNOSTICS')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vagrant/.local/lib/python3.6/site-packages/influxdb/client.py", line 518, in query
    expected_response_code=expected_response_code
  File "/home/vagrant/.local/lib/python3.6/site-packages/influxdb/client.py", line 352, in request
    raw=False)
  File "msgpack/_unpacker.pyx", line 213, in msgpack._cmsgpack.unpackb
ValueError: Unpack failed: incomplete input
  • With InfluxDB-python version: 5.2.3 the bug is not present.

chaconpiza avatar May 06 '20 09:05 chaconpiza

I can confirm query with chunk=True does not work on 5.3.0.

yozik04 avatar Jun 10 '20 19:06 yozik04

I can confirm query with chunk=True does not work on 5.3.0.

xiandong79 avatar Jun 11 '20 01:06 xiandong79

Hello Team, Any workaround for this issue?

nikparmar avatar Jun 21 '20 09:06 nikparmar

Hello Team, Any workaround for this issue?

Sure use <5.3.0

yozik04 avatar Jun 21 '20 12:06 yozik04

having the same issue - any progress?

marko-asplund avatar Aug 10 '20 09:08 marko-asplund

Hi,

Having same issue. Any solution ?

Debian GNU/Linux 9.4 (stretch) python 2.7.13 Influx 1.8.3 Influxdb 5.3.1 msgpack 1.0.2

msgpack.exceptions.ExtraData: unpack(b) received extra data.

Traceback (most recent call last): File "/code/apps/FuelChangeoverPlot.py", line 179, in exportFromDb data = data_fetcher.fetch_fuel_change_over_plot(start_time=rangeStart, end_time=rangeEnd) File "/code/db_interface/data_fetcher.py", line 35, in fetch_fuel_change_over_plot df_dict = db_connector.query_for_single_measurement_range( File "/code/db_interface/db_connector.py", line 80, in query_for_single_measurement_range df_dict = client.query( File "/usr/local/lib/python3.9/site-packages/influxdb/_dataframe_client.py", line 199, in query results = super(DataFrameClient, self).query(query, **query_args) File "/usr/local/lib/python3.9/site-packages/influxdb/client.py", line 521, in query response = self.request( File "/usr/local/lib/python3.9/site-packages/influxdb/client.py", line 358, in request response._msgpack = msgpack.unpackb( File "msgpack/_unpacker.pyx", line 202, in msgpack._cmsgpack.unpackb

AnkitSinghvi99 avatar Dec 30 '20 10:12 AnkitSinghvi99

There are actually two issues here:

  1. Unpack issue when using msgpack. I did not debug this further, but here's a workaround that works for me: use json instead of msgpack. This can be forced using: client = InfluxDBClient(host, port, u, p, db, headers={'Accept': 'application/json'}, gzip=True)

  2. Even when using the above, DataFrameClient does not work. This is because DataFrameClient was not updated along with this commit c903d73.

srijan avatar Jan 28 '21 13:01 srijan

Debian GNU/Linux bullseye/sid Python 3.9.2 influxdb-python master branch Influxdb 1.8.5 and 1.7.3 msgpack 1.0.2

Run my test scripts with export MSGPACK_PUREPYTHON=1 to use python implementation of msgpackrather than the C, easier for debugging.

Analysis

I've looked into this issue today, it looks to me like a combination of two problems:

  • I forgot to add the stream/chunk changes to DataFrameClient because I didn't even realize it existed, I'll submit a PR with the proper changes for it.

  • If I run a 'SHOW DIAGNOSTIC' with headers set to accept JSON, I get the following:

{
  "results": [
    {
      "statement_id": 0,
      "series": [
        {
          "name": "build",
          "columns": [
            "Branch",
            "Build Time",
            "Commit",
            "Version"
          ],
          "values": [
            [
              "1.7",
              "",
              "ff383cdc0420217e3460dabe17db54f8557d95b6",
              "1.7.8"
            ]
          ]
        },
        {
          "name": "config",
          "columns": [
            "bind-address",
            "reporting-disabled"
          ],
          "values": [
            [
              "127.0.0.1:8098",
              true
            ]
          ]
        },
        {
          "name": "config-coordinator",
          "columns": [
            "log-queries-after",
            "max-concurrent-queries",
            "max-select-buckets",
            "max-select-point",
            "max-select-series",
            "query-timeout",
            "write-timeout"
          ],
          "values": [
            [
              "0s",
              0,
              0,
              0,
              0,
              "0s",
              "10s"
            ]
          ]
        },
        {
          "name": "config-cqs",
          "columns": [
            "enabled",
            "query-stats-enabled",
            "run-interval"
          ],
          "values": [
            [
              true,
              false,
              "1s"
            ]
          ]
        },
        {
          "name": "config-data",
          "columns": [
            "cache-max-memory-size",
            "cache-snapshot-memory-size",
            "cache-snapshot-write-cold-duration",
            "compact-full-write-cold-duration",
            "dir",
            "max-concurrent-compactions",
            "max-index-log-file-size",
            "max-series-per-database",
            "max-values-per-tag",
            "series-id-set-cache-size",
            "wal-dir",
            "wal-fsync-delay"
          ],
          "values": [
            [
              1073741824,
              26214400,
              "10m0s",
              "4h0m0s",
              "/var/lib/influxdb/data",
              0,
              1048576,
              1000000,
              100000,
              100,
              "/var/lib/influxdb/wal",
              "0s"
            ]
          ]
        },
        {
          "name": "config-httpd",
          "columns": [
            "access-log-path",
            "bind-address",
            "enabled",
            "https-enabled",
            "max-connection-limit",
            "max-row-limit"
          ],
          "values": [
            [
              "",
              ":8096",
              true,
              false,
              0,
              0
            ]
          ]
        },
        {
          "name": "config-meta",
          "columns": [
            "dir"
          ],
          "values": [
            [
              "/var/lib/influxdb/meta"
            ]
          ]
        },
        {
          "name": "config-monitor",
          "columns": [
            "store-database",
            "store-enabled",
            "store-interval"
          ],
          "values": [
            [
              "_internal",
              true,
              "10s"
            ]
          ]
        },
        {
          "name": "config-precreator",
          "columns": [
            "advance-period",
            "check-interval",
            "enabled"
          ],
          "values": [
            [
              "30m0s",
              "10m0s",
              true
            ]
          ]
        },
        {
          "name": "config-retention",
          "columns": [
            "check-interval",
            "enabled"
          ],
          "values": [
            [
              "30m0s",
              true
            ]
          ]
        },
        {
          "name": "config-subscriber",
          "columns": [
            "enabled",
            "http-timeout",
            "write-buffer-size",
            "write-concurrency"
          ],
          "values": [
            [
              true,
              "30s",
              1000,
              40
            ]
          ]
        },
        {
          "name": "network",
          "columns": [
            "hostname"
          ],
          "values": [
            [
              "db01"
            ]
          ]
        },
        {
          "name": "runtime",
          "columns": [
            "GOARCH",
            "GOMAXPROCS",
            "GOOS",
            "version"
          ],
          "values": [
            [
              "amd64",
              2,
              "linux",
              "go1.11"
            ]
          ]
        },
        {
          "name": "system",
          "columns": [
            "PID",
            "currentTime",
            "started",
            "uptime"
          ],
          "values": [
            [
              10884,
              "2021-04-26T09:59:01.187859258Z",
              "2021-04-26T08:10:39.214602676Z",
              "1h48m21.973256582s"
            ]
          ]
        }
      ]
    }
  ]
}

When running without any headers, we get msgpack back with the following:

b'\x81\xa7results\x91\x82\xacstatement_id\x00\xa6series\x9e\x83\xa4name\xa5build\xa7columns\x94\xa6Branch\xaaBuild Time\xa6Commit\xa7Version\xa6values\x91\x94\xa31.7\xa0\xd9(ff383cdc0420217e3460dabe17db54f8557d95b6\xa51.7.8\x83\xa4name\xa6config\xa7columns\x92\xacbind-address\xb2reporting-disabled\xa6values\x91\x92\xae127.0.0.1:8098\xc3\x83\xa4name\xb2config-coordinator\xa7columns\x97\xb1log-queries-after\xb6max-concurrent-queries\xb2max-select-buckets\xb0max-select-point\xb1max-select-series\xadquery-timeout\xadwrite-timeout\xa6values\x91\x97\x00\x00\x00\x00\x83\xa4name\xaaconfig-cqs\xa7columns\x93\xa7enabled\xb3query-stats-enabled\xacrun-interval\xa6values\x91\x93\xc3\xc2\x83\xa4name\xabconfig-data\xa7columns\x9c\xb5cache-max-memory-size\xbacache-snapshot-memory-size\xd9"cache-snapshot-write-cold-duration\xd9 compact-full-write-cold-duration\xa3dir\xbamax-concurrent-compactions\xb7max-index-log-file-size\xb7max-series-per-database\xb2max-values-per-tag\xb8series-id-set-cache-size\xa7wal-dir\xafwal-fsync-delay\xa6values\x91\x9c\xb6/var/lib/influxdb/data\x00\xd2\x00\x0fB@\xd2\x00\x01\x86\xa0d\xb5/var/lib/influxdb/wal\x83\xa4name\xacconfig-httpd\xa7columns\x96\xafaccess-log-path\xacbind-address\xa7enabled\xadhttps-enabled\xb4max-connection-limit\xadmax-row-limit\xa6values\x91\x96\xa0\xa5:8096\xc3\xc2\x00\x00\x83\xa4name\xabconfig-meta\xa7columns\x91\xa3dir\xa6values\x91\x91\xb6/var/lib/influxdb/meta\x83\xa4name\xaeconfig-monitor\xa7columns\x93\xaestore-database\xadstore-enabled\xaestore-interval\xa6values\x91\x93\xa9_internal\xc3\x83\xa4name\xb1config-precreator\xa7columns\x93\xaeadvance-period\xaecheck-interval\xa7enabled\xa6values\x91\x93\xc3\x83\xa4name\xb0config-retention\xa7columns\x92\xaecheck-interval\xa7enabled\xa6values\x91\x92\xc3\x83\xa4name\xb1config-subscriber\xa7columns\x94\xa7enabled\xachttp-timeout\xb1write-buffer-size\xb1write-concurrency\xa6values\x91\x94\xc3\xd1\x03\xe8(\x83\xa4name\xa7network\xa7columns\x91\xa8hostname\xa6values\x91\x91\xa4db01\x83\xa4name\xa7runtime\xa7columns\x94\xa6GOARCH\xaaGOMAXPROCS\xa4GOOS\xa7version\xa6values\x91\x94\xa5amd64\x02\xa5linux\xa6go1.11\x83\xa4name\xa6system\xa7columns\x94\xa3PID\xabcurrentTime\xa7started\xa6uptime\xa6values\x91\x94\xd1*\x84\xc7\x0c\x05\x00\x00\x00\x00`\x86\x8e\xe5\x12J\xde\xab\xc7\x0c\x05\x00\x00\x00\x00`\x86u\x7f\x0c\xca\x93\xb4\xb21h48m22.092293879s'

Both should be representing the same data but the config-coordinator structure doesn't include all the values: x83\xa4name\xb2config-coordinator\xa7columns\x97\xb1log-queries-after\xb6max-concurrent-queries\xb2max-select-buckets\xb0max-select-point\xb1max-select-series\xadquery-timeout\xadwrite-timeout\xa6values\x91\x97\x00\x00\x00\x00\x83\xa4name\xaaconfig-cqs We can see here by the end of the string, we have \x97 that defines a 7 entries 'fixarray' but we're getting only three zeroes (\x00) before seeing an \x83 that should start the next data structure ('config-cqs'). For this reason, I believe the bug actually exists server side. That might be a similar issue generated when doing a regular query, I couldn't figure it out. I'm also not extra comfortable with go so couldn't really find where this is implemented in the server. This behavior appeared soon after my commit because 7fb5e946062dd36a84801e4a03012a3c032a70db changed the default headers to request msgpack instead of the default JSON.

Summary

  1. I should push a PR to implement the fixed chunked behavior in DataFrameClient.
  2. I suspect there is a bug with the msgpack implementation server side but can't help with this. I think someone with better go knowledge should dig on that one.

hrbonz avatar Apr 26 '21 14:04 hrbonz

@sebito91

hrbonz avatar Apr 26 '21 16:04 hrbonz

Tried to do the request directly on the line with curl and still got a messed up msgpack answer with the same issue:

$ curl -G 'http://localhost:8096/query' --data-urlencode q='SHOW DIAGNOSTICS'  --header "Accept: application/x-msgpack" --header "Content-Type: application/json" -u root --output response.txt

hrbonz avatar Apr 27 '21 01:04 hrbonz

@hrbonz @sebito91 May be i am asking a silly question here. Above fix is part of current released library or future release. If future when it is expected to release?

As i tested today i still get below issue. msgpack.exceptions.ExtraData: unpack(b) received extra data.

AnkitSinghvi99 avatar May 05 '21 14:05 AnkitSinghvi99

Same issue here: msgpack/_unpacker.pyx in msgpack._cmsgpack.unpackb()

ExtraData: unpack(b) received extra data.

MichielBbal avatar May 11 '21 07:05 MichielBbal

For any future readers,

  • The error persists in 5.3.1 and in 5.3.0 as well.
  • This query works and doesnt throw msgpack.exceptions.extradata: unpack(b) received extra data without adding additional headers like {'Accept': 'Application/json'} and while still using msgpack i believe. Please note that I am not using the DataFrameClient. This query in my case fetches around 6.67mil points and takes 403.592 seconds.
client = InfluxDBClient(host=host, port=port, username=user, password=password, database=dbname)
start_time = time.monotonic()
res = pd.DataFrame(client.query("select * from X where time > now() - 30m", chunked=True).get_points())
end_time = time.monotonic()
with outlock:
     print("Result from {} took {}".format(host,end_time-start_time))
     print(res)

Versions used

python --version = Python 3.7.8 influxdb.__version__ = 5.2.3

KirannBhavaraju avatar May 27 '21 09:05 KirannBhavaraju

Still get ExtraData: unpack(b) received extra data., but after trying @KirannBhavaraju suggestion it worked!

Only thing I did was to remove thechunk_size=xxxx argument.

client = InfluxDBClient(blah blah)
result = client.query(q, chunked=True)

python = "^3.8" influxdb = "5.3.1"

ErlendFax avatar Sep 07 '21 14:09 ErlendFax

Only thing I did was to remove thechunk_size=xxxx argument.

Responses will be chunked by series or by every 10,000 points, whichever occurs first. https://docs.influxdata.com/influxdb/v1.7/guides/querying_data/#chunking

Kylmakalle avatar Nov 10 '21 15:11 Kylmakalle