"internal error: Missing parent ID on node" when uploading many files
Describe the bug
I'm trying to upload a folder to a space with three subfolders and a total of 282 files. While most of the files are uploaded fine, some files throw an error message like this:
{
"level": "error",
"service": "storage-users",
"host.name": "opencloud-75b46f489b-jkwxk",
"pkg": "rgrpc",
"driver": "posix",
"error": "internal error: Missing parent ID on node",
"path": "/var/lib/opencloud/storage/users/projects/c3e284a5-5e4d-4be5-80c6-49b31de55616/filename.ext",
"time": "2025-10-02T07:43:14Z",
"message": "failed to read node"
}
Steps to reproduce
- Upload a folder with lots of files
Expected behavior
No errors but uploading all files correctly
Actual behavior
See Describe the bug
Setup
OpenCloud is deployed in a k3s cluster. The configuration and data storage are on a mounted NFS storage, the underlying file system is ZFS.
Additional context
Sometimes the files are still uploaded even with the error message. But sometimes the OpenCloud frontend brings an error message with "Unknown Error" and gives me trace IDs.
Here's an example frontend error that sometimes occurs.
OpenCloud is deployed in a k3s cluster. The configuration and data storage are on a mounted NFS storage, the underlying file system is ZFS.
That is interesting. How did you mount the NFS on the node?
For OpenCloud in kubernetes is essential that the NFS mount uses no Caching which is the noac mount option.
Just out of curiosity? How did you deploy opencloud in K3s?
The NFS are mounted using the PersistentVolumeClaims in my yaml files. For example:
apiVersion: v1
kind: PersistentVolume
metadata:
name: opencloud-config-pv
spec:
storageClassName: ""
capacity:
storage: 100Mi
accessModes:
- ReadWriteMany
nfs:
path: /mnt/Main/k3s/opencloud/config
server: srv-nas-01.rysenet.local
persistentVolumeReclaimPolicy: Retain
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: opencloud-config-pvc
spec:
storageClassName: ""
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
volumeName: opencloud-config-pv
And besides the PVCs, here's what I use to deploy to k3s:
apiVersion: apps/v1
kind: Deployment
metadata:
name: opencloud
labels:
app: opencloud
spec:
replicas: 1
selector:
matchLabels:
app: opencloud
template:
metadata:
labels:
app: opencloud
spec:
nodeSelector:
region: "public"
containers:
- name: opencloud
image: opencloudeu/opencloud-rolling:latest
imagePullPolicy: IfNotPresent
command: ["/bin/sh"]
args: ["-c", "opencloud init || true; opencloud server"]
ports:
- name: http
containerPort: 9200
- name: nats
containerPort: 9233
env:
- name: OC_INSECURE
value: "true"
- name: OC_DOMAIN
value: "my.domain.eu"
- name: OC_URL
value: "https://my.domain.eu"
- name: PROXY_HTTP_ADDR
value: "0.0.0.0:9200"
- name: INITIAL_ADMIN_PASSWORD
value: "a-super-secret-initial-password"
- name: PROXY_ENABLE_BASIC_AUTH
value: "true"
- name: PROXY_TLS
value: "false"
volumeMounts:
- name: opencloud-config
mountPath: /etc/opencloud
- name: opencloud-data
mountPath: /var/lib/opencloud
volumes:
- name: opencloud-config
persistentVolumeClaim:
claimName: opencloud-config-pvc
- name: opencloud-data
persistentVolumeClaim:
claimName: opencloud-data-pvc
---
apiVersion: v1
kind: Service
metadata:
name: opencloud
spec:
selector:
app: opencloud
ports:
- port: 9200
targetPort: 9200
name: http
- port: 9233
targetPort: 9233
name: nats
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: opencloud-ingress
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
rules:
- host: my.domain.eu
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: opencloud
port:
number: 9200
TLS certificate termination is handled by my OPNsense firewall and the HAProxy plugin.
That did not answer the question about the real nfs host mount.
It actually does. The rest is done automatically by the k3s node and I didn't have to manually mount the share. But here's the output of mount directly on the worker node:
srv-nas-01.rysenet.local:/mnt/Main/k3s/opencloud/data on /var/lib/kubelet/pods/05f69b13-9e26-47fe-a56c-62aa928d4ade/volumes/kubernetes.io~nfs/opencloud-data-pv type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.1.4,local_lock=none,addr=192.168.178.12)
I have modified the PersistentVolumes and added:
mountOptions:
- noac
- vers=4.2
After restarting the service, the mount options now look like this:
(rw,sync,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.1.4,local_lock=none,addr=192.168.178.12)
Which should be fine now. I'll test OpenClouds behavior and will provide feedback.
So, I'm getting different errors now. The old error still occurs but not that often anymore. Instead, I'm getting a lot of numerical result out of range errors.
{
"level": "error",
"service": "storage-users",
"host.name": "opencloud-67f7cff9f6-wvbw5",
"pkg": "rgrpc",
"driver": "posix",
"error": "xattr.list /var/lib/opencloud/storage/users/projects/c3e284a5-5e4d-4be5-80c6-49b31de55616/filename.ext",
"time": "2025-10-02T11:01:02Z",
"message": "failed to read node"
}
@DamianRyse thank you for reporting.
Another issue I'm facing now is when I'm trying to rename a folder in a space:
{"level":"error","service":"storage-users","host.name":"opencloud-67f7cff9f6-wvbw5","pkg":"rgrpc","traceid":"3ebf412ad45e70384a47ed6ef14c217d","error":"node.XattrsWithReader: no data available","spaceid":"c3e284a5-5e4d-4be5-80c6-49b31de55616","nodeid":"","time":"2025-10-02T12:47:12Z","message":"error reading permissions"}
I'm certainly sure it has something to do with the data dir being a NFS mount. But no clue why and how to fix it.
@butonic @rhafer any ideas?
After experimenting around more with this setup, I'd like to share my observations:
Case 1: Data dir is a default NFS share
OpenClouds speed is somewhat okay. It does not fully utilize a standard gigabit ethernet connection. Speed tests with iperf3 otherwise showed full speed potential.
When uploading multiple files via the webinterface, a lot of POSIX errors occured (see my comments above). Sometimes files fail to be uploaded completely, even when retrying them as a single file upload afterwards again.
Case 2: Data dir is a NFS share but with noac option
OpenCloud is basically unusable in this configuration. Disabling the caching completely forces the filesystem to flush every IO operation before continuing with the next one. This results in stuttering uploads where you have short bursts of data transfers then a break for a few seconds before the next burst happens. While uploading files and therefore occupying the filesystem, the webinterfaces responsivness is highly degraded.
Case 3: Data dir is a NFS share but with low caching times
Instead of disabling the cache completely, I've limited the caching timings to very low values:
acregmin=1
acregmax=5
acdirmin=1
acdirmax=5
This did help a little bit performance wise and stability wise but by far not as good as expected. I've tried various timings to figure out if it affects somewhat the file uploads but it had only little impact.
Case 4: Replaced NFS with an iSCSI block device
This is currently my best solution to a network based storage for OpenCloud. Uploading many small files does work flawless without any of the mentioned posix errors. Data transfer is still very slow compared to what would be possible. Uploading about 200 files each about 7 MB in size results in an average upload speed of 25 MiB/sec. On the other hand, uploading a single large file (about 3 GiB in my test) the upload speed increased to about 60 MiB/sec. A manual file copy from my local client to the block storage maxed at about 106 MiB/sec.
Case 5: Standard containerized installation on a single VM with only local storage
No issues at all. OpenCloud worked perfectly well with very good performance and no errors.
Conclusion
It's clear to me that OpenCloud has a weakness when it's storage is on a network share. I haven't tried SMB but I'd assume it would be similar slow as NFS. I've seen a lot of xattr errors when using NFS which I thought might happen because of a misconfiguration of my NFS share, but I haven't found anything wrong. So, for people who want to use Kubernetes to run their OpenCloud and plan to use a file storage server, I'd recommend going for an iSCSI solution. Even if file transfers are still slow, they at least work without IO errors.
@DamianRyse thank you for the thorough comparison and your efforts.
The opencloud team runs opencloud on large network filesystems like CephFS and IBM spectrum scale (GPFS).
I think the AWS elastic filesystem and AWS lustreFS can also be good candidates.