fix(open): 修复open系统调用在gvisor下的异常表现
symlink和权限检查的测试屏蔽了,补充了边界测试,其他点都过了
这里有些AI的review还没回应 @Vitus213
@claude review,审查功能、性能、安全、linux一致性的问题
Claude finished @fslongjin's task —— View job
PR Review for #1417: fix(open): 修复open系统调用在gvisor下的异常表现
I've completed a comprehensive review of the PR changes for functionality, performance, security, and Linux consistency. Here's my detailed feedback:
Current Progress:
- [x] Create todo list
- [x] Gather context: examine pre-fetched data and read changed files
- [x] Understand the request: analyze the PR changes and review comments
- [x] Review functionality: check if the implementation matches POSIX/Linux semantics
- [x] Review performance: identify potential bottlenecks or inefficiencies
- [x] Review security: check for security vulnerabilities
- [x] Review Linux consistency: verify behavior aligns with Linux kernel
- [x] Provide comprehensive review feedback
Summary
Overall, this PR makes significant improvements to open syscall semantics, fixing several test failures. The changes generally align with POSIX/Linux behavior, but there are some important issues that need to be addressed.
Detailed Review
✅ Positive Changes
-
O_APPEND semantics corrected (
file.rs:630-635):- File offset now moves to EOF only during write operations, not at open time
lseek(fd, 0, SEEK_CUR)returns 0 after open with O_APPEND (POSIX compliant)
-
O_TRUNC timing fixed (
open.rs:279-284):- Truncation occurs at inode level before File object creation
- Allows
O_RDONLY | O_TRUNCwith write permission to file (Linux behavior)
-
Path validation improved:
- Empty path check returns
ENOENT(open.rs:179-181) - Trailing slash validation for non-directories returns
ENOTDIR(open.rs:223-225)
- Empty path check returns
-
Directory flag validation (
open.rs:239-253):- Proper checks for
O_TRUNC,O_DIRECT, write access modes on directories - Returns appropriate error codes (
EISDIR,EINVAL)
- Proper checks for
-
Error code alignment (
file.rs:768-804):- Changed from
EPERMtoEBADFfor access mode validation (POSIX compliant) - Added invalid access mode checks (e.g.,
O_WRONLY | O_RDWR)
- Changed from
-
Code optimization (commit
bfe0389):- Removed redundant metadata retrieval in
open.rs - Eliminated unused
set_offsetmethod (dead code)
- Removed redundant metadata retrieval in
⚠️ Critical Issues
1. Race Condition in O_APPEND Writes (file.rs:630-635)
let actual_offset =
if flags.contains(FileFlags::O_APPEND) && matches!(file_type, FileType::File) {
md.size as usize // ← Race: file size read here
} else {
offset
};
// ... later at line 669-670:
let written_len = self.inode.write_at(actual_offset, actual_len, buf, ...)?;
Problem: File size is read from metadata, but another process could write to the file before write_at() executes, causing:
- Lost writes (data overwritten)
- Data corruption
Fix Needed: Implement atomic append operation or file-level locking. The file size should be re-read immediately before write_at(), or use an atomic append operation if supported by the filesystem.
2. Offset Update Before Write Completion (Partially Fixed)
Commit bfe0389 fixed the immediate offset update in O_APPEND, but the general pattern still has issues:
// Current code (lines 672-681) - CORRECT:
if update_offset {
if flags.contains(FileFlags::O_APPEND) {
self.offset.store(
actual_offset + written_len, // ← Updated after successful write
core::sync::atomic::Ordering::SeqCst,
);
} else {
self.offset.fetch_add(written_len, ...);
}
}
Status: This is now correct - offset only updates after successful write.
🔧 Code Quality Issues
3. Inconsistent Permission Checking
// readable() checks can_read()
if !mode.can_read() { // line 781-783
return Err(SystemError::EINVAL);
}
// writeable() does NOT check can_write()
// Should it check mode.can_write() for consistency?
4. Symlink Handling Ambiguity (open.rs:187-189)
let inode = inode_begin.lookup_follow_symlink2(&path, ..., follow_symlink);
Question: When follow_symlink=false and final component is a symlink, does lookup_follow_symlink2() return the symlink inode itself? If yes, the removed symlink check was unnecessary. If no, O_NOFOLLOW may not work correctly.
5. Chinese Comments in Blocklist
user/apps/tests/syscall/gvisor/blocklists/open_test:1: Uses Chinese characters. For consistency with the codebase, consider English:
# Block tests related to symlinks and permission checks, as they are not yet implemented
📊 Performance Considerations
- ✅ Redundant metadata calls fixed (was:
inode.metadata()called twice in open path) - ⚠️ O_APPEND race condition affects correctness more than performance
- Locking strategy: Consider if inode-level locking is needed for concurrent writes
🔒 Security Assessment
-
Permission checks appear correct:
- Non-O_PATH files:
cred.inode_permission()with appropriatePermissionMask - O_TRUNC: checked at inode level with write permission
- S_APPEND inode flag: prevents non-append writes
- Non-O_PATH files:
-
Path validation adequate:
- Empty paths rejected
- Trailing slash validation
- MAX_PATHLEN enforcement in
check_and_clone_cstr()
🐧 Linux Consistency
| Behavior | DragonOS | Linux | Notes |
|---|---|---|---|
| `O_RDONLY | O_TRUNC` | ✅ Allowed with write perm | ✅ Allowed with write perm |
O_APPEND offset after open |
✅ 0 | ✅ 0 | lseek(fd, 0, SEEK_CUR) |
Directory with O_TRUNC |
✅ EISDIR |
✅ EISDIR |
|
Directory with O_DIRECT |
✅ EINVAL |
✅ EINVAL |
|
| Empty path | ✅ ENOENT |
⚠️ Context-dependent | AT_EMPTY_PATH special case |
| Trailing slash non-dir | ✅ ENOTDIR |
✅ ENOTDIR |
Recommendations
High Priority
- Fix O_APPEND race condition (
file.rs:630-635):- Implement atomic append or re-read size before write
- Consider inode-level locking for concurrent writes
Medium Priority
- Verify O_NOFOLLOW behavior: Ensure
lookup_follow_symlink2()returns symlink inode whenfollow_symlink=false - Standardize permission checks: Consider adding
can_write()check inwriteable()for consistency
Low Priority
- Translate blocklist comment to English
- Add debug assertion for unreachable symlink check (as suggested by Copilot)
Conclusion
This PR significantly improves open syscall compliance with POSIX/Linux standards. The functional changes are mostly correct, but the O_APPEND race condition is a critical issue that must be fixed before merging. Addressing this will prevent data corruption in concurrent write scenarios.
The PR is mostly ready but requires the race condition fix for production readiness.
Review complete - All aspects (functionality, performance, security, Linux consistency) have been evaluated.