influxdb icon indicating copy to clipboard operation
influxdb copied to clipboard

client got "no space left on device" for 1.4.2

Open superheizai opened this issue 6 years ago • 30 comments

influxDB: 1.4.2 CentOS7

when send info to influxDB, I got messages like below. It gave me tips aoubt no space, but I checked on server, here are lots of spaces over there. Any ideas?

write influx error org.influxdb.InfluxDBException: engine: error writing WAL entry: write /var/lib/influxdb/wal/kafka/autogen/335/_00107.wal: no space left on device at org.influxdb.InfluxDBException.buildExceptionFromErrorMessage(InfluxDBException.java:154) at org.influxdb.InfluxDBException.buildExceptionForErrorState(InfluxDBException.java:166) at org.influxdb.impl.InfluxDBImpl.execute(InfluxDBImpl.java:584) at org.influxdb.impl.InfluxDBImpl.write(InfluxDBImpl.java:355)

Directions

GitHub Issues are reserved for actionable bug reports and feature requests. General questions should be sent to the InfluxDB Community Site.

Before opening an issue, search for similar bug reports or feature requests on GitHub Issues. _If no similar issue can be found, fill out either the "Bug Report" or the "Feature Request" section below. Erase the other section and everything on and above this line.

Bug report

System info: [Include InfluxDB version, operating system name, and other relevant details]

Steps to reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on...]

Expected behavior: [What you expected to happen]

Actual behavior: [What actually happened]

Additional info: [Include gist of relevant config, logs, etc.]

Also, if this is an issue of for performance, locking, etc the following commands are useful to create debug information for the team.

curl -o profiles.tar.gz "http://localhost:8086/debug/pprof/all?cpu=true"

curl -o vars.txt "http://localhost:8086/debug/vars"
iostat -xd 1 30 > iostat.txt

Please note It will take at least 30 seconds for the first cURL command above to return a response. This is because it will run a CPU profile as part of its information gathering, which takes 30 seconds to collect. Ideally you should run these commands when you're experiencing problems, so we can capture the state of the system at that time.

If you're concerned about running a CPU profile (which only has a small, temporary impact on performance), then you can set ?cpu=false or omit ?cpu=true altogether.

Please run those if possible and link them from a gist or simply attach them as a comment to the issue.

Please note, the quickest way to fix a bug is to open a Pull Request.

Feature Request

Opening a feature request kicks off a discussion. Requests may be closed if we're not actively planning to work on them.

Proposal: [Description of the feature]

Current behavior: [What currently happens]

Desired behavior: [What you would like to happen]

Use case: [Why is this important (helps with prioritizing requests)]

superheizai avatar May 30 '18 09:05 superheizai

same issue on debian 8 400: {"error":"partial write: write /data/influxdb/data/_series/00/0000: no space left on device dropped=1"}

JamesGalt avatar Aug 23 '18 00:08 JamesGalt

got the same on debian 8. Any workaround here?

utjc02 avatar Sep 19 '18 14:09 utjc02

Same here, on PC running Debian. Disk was full, freed space, still getting errors like "error writing WAL entry: write /var/lib/influxdb/wal/kafka/autogen/335/_00107.wal: no space left on device"

A response would be nice.

godidog avatar Sep 26 '18 18:09 godidog

I got the same problem: Exception in thread "Thread-11" org.influxdb.InfluxDBException: engine: error writing WAL entry: write /home/influxdb/wal/s2/second/409/_00001.wal: file already closed at org.influxdb.InfluxDBException.buildExceptionFromErrorMessage(InfluxDBException.java:154) at org.influxdb.InfluxDBException.buildExceptionForErrorState(InfluxDBException.java:166) at org.influxdb.impl.InfluxDBImpl.execute(InfluxDBImpl.java:609) at org.influxdb.impl.InfluxDBImpl.write(InfluxDBImpl.java:369) at org.influxdb.impl.InfluxDBImpl.write(InfluxDBImpl.java:382) at com.cnegroup.power.influxdb.InfluxDBConnection.batchInsert(InfluxDBConnection.java:201) at com.cnegroup.power.receiver.InfluxReceiverRunnable.saveRecordValue(InfluxReceiverRunnable.java:201) at com.cnegroup.power.receiver.InfluxReceiverRunnable.run(InfluxReceiverRunnable.java:79) at java.lang.Thread.run(Thread.java:748) How to solve this problem ? I find my Hard disk storage space is used 100% which installed influxDB, So expand my hard disk storage space, It began to work.

541211190 avatar Oct 22 '18 07:10 541211190

same here, I got plenty of spaces on my disk, yet influxdb insisted that no space left on device

yifeikong avatar Oct 30 '18 15:10 yifeikong

restart influxdb seems to solve the problem

yifeikong avatar Oct 30 '18 16:10 yifeikong

The problem is that it empties the db cache so all points in cache are lost.

Le 30 oct. 2018 à 17:08, Yifei Kong [email protected] a écrit :

restart influxdb seems to solve the problem

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

JamesGalt avatar Oct 30 '18 16:10 JamesGalt

same issue on debian 8 400: {"error":"partial write: write /data/influxdb/data/_series/00/0000: no space left on device dropped=1"} I also have the same question post result:{"error":"partial write: write /data/device_statistics/_series/00/0000: no space left on device dropped=495"}

Cloud you please tell me how to solve the question

licunzhi avatar Dec 28 '18 12:12 licunzhi

@JamesGalt Have you solved that question? I want to know how to do to slove that

licunzhi avatar Dec 28 '18 13:12 licunzhi

I'm experiencing the same on influxdb 1.6.1 over debian 8 Some clients are sending metrics with no issues while other client is getting the disk full response, while others are able to send these no problem. I can see no relevant log on the influxdb side.

kali-brandwatch avatar May 02 '19 11:05 kali-brandwatch

For me as well. Debian 9.9 with influxdb 1.7.6.

Do you need more information?

bmg-iis avatar May 09 '19 12:05 bmg-iis

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 07 '19 12:08 stale[bot]

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

stale[bot] avatar Aug 14 '19 12:08 stale[bot]

I'm seeing this issue with v1.7.7. The problem is that once a disk error is encountered, the buffer writer of WAL cannot recover, even when the disk goes back to work. The silent failure is a problem and it's better to escalate if we cannot handle it.

foobar avatar Sep 23 '19 01:09 foobar

I saw this behaviour on Centos, influxdb-1.7.9-1.x86_64: Disk goes full -> wal write error. Clean up some disk space but influxdb does not recover. And not only does it not recover which is a bit annoying, but worse it constantly complains about the failure to syslog thereby rapidly filling up the disk again...

Restart of the service "fixed" the problem.

hlovdal avatar Dec 26 '19 00:12 hlovdal

Same here on Debian GNU/Linux 9.13 (stretch) with InfluxDB 1.8.0. I just lost a week worth of data because I hadn't noticed there was a short period where the partition to which InfluxDB stores its data had run out of space. The partition recovered instantly as plenty of temporary space was freed, however InfluxDB kept complaining about no space and did not store any data persistently, yet responded to reading queries properly and did not return errors to the writing clients either.

I think this sort of behavior is fundamentally broken. A database should have persistence as primary goal. If data cannot be made persistent, writers should see exceptions by default. Also, e-mail alerts may have come in handy because I'm not permanently monitoring my syslog.

axeluhl avatar Jan 09 '21 19:01 axeluhl

I am having this problem, still. Influxdb 1.8.5 on Raspbian 10 buster

n0valis avatar May 03 '21 21:05 n0valis

I also having this problem,still ,this error still exists :(

iwasjoker avatar May 10 '21 08:05 iwasjoker

I've seen that happen with data being stored on a docker volume that ran out of space. After resizing the disk error hasn't gone away until the instance was restarted. InfluxDB Version 2.0.6 ('7c3ead)

ErvalhouS avatar Aug 02 '21 13:08 ErvalhouS

This error still exists Influxdb-1.8.9-1 on CentOS 7.5

camilesing avatar Aug 19 '21 06:08 camilesing

Same happens on InfluxDB 2.0.8 in docker. If it reaches "no space" state, it won't recover till manual restart.

woloss avatar Aug 25 '21 11:08 woloss

same here, and the ram usage jumped from 8% to 30%

geogeim avatar Oct 11 '21 01:10 geogeim

same error here //: engine: error writing WAL entry: write /var/lib/influxdb2/engine/wal/0d284939d1e0cea0/autogen/602/_00219.wal: no space left on device'. Retry in: 125s. influx installed in docker with external volume. The file system has once be full and then room have been made.

DamienMiras avatar Jan 29 '22 19:01 DamienMiras

same on influxdb:2.2 say: write failed (attempts 1): internal error: unexpected error writing points to database: engine: error writing WAL entry: write /var/lib/influxdb2/engine/wal/635fdc5ff6ade089/autogen/6/_00029.wal: no space left on device

giter avatar Jul 27 '22 09:07 giter

Same here on influxDB v1.8.10 (git:1.8 688e697c51fd) Sep 8 17:14:28 raspberrypi influxd-systemd-start.sh[434]: ts=2022-09-08T15:14:28.167771Z lvl=info msg="Error writing snapshot" log_id=0cmfnfT0000 engine=tsm1 error="error opening new segment file for wal (1): write /var/lib/influxdb/wal/_internal/monitor/3/_00002.wal: no space left on device"

Are there any possibilities to solve this issue? Appearently not even with the most recent version V2.2

Germaier avatar Sep 08 '22 20:09 Germaier

The reason for this situation may be that the disk has been running low on space before, restart the service

rever67697 avatar Mar 03 '23 02:03 rever67697

// bufio.Writer // Flush writes any buffered data to the underlying io.Writer. func (b *Writer) Flush() error { if b.err != nil { return b.err } ... } bufio.Writer will always return last error. If disk had been full and then been free, Flush method always returns disk full error.

Sniper91 avatar Mar 30 '23 06:03 Sniper91

Same here on 2.1.1, only restart of service seems to fix it

MarcoPignati avatar Apr 11 '23 11:04 MarcoPignati