DragonOS icon indicating copy to clipboard operation
DragonOS copied to clipboard

fix(open): 修复open系统调用在gvisor下的异常表现

Open Vitus213 opened this issue 1 month ago • 3 comments

symlink和权限检查的测试屏蔽了,补充了边界测试,其他点都过了

Vitus213 avatar Nov 28 '25 15:11 Vitus213

这里有些AI的review还没回应 @Vitus213

fslongjin avatar Dec 01 '25 10:12 fslongjin

@claude review,审查功能、性能、安全、linux一致性的问题

fslongjin avatar Dec 05 '25 07:12 fslongjin

Claude finished @fslongjin's task —— View job


PR Review for #1417: fix(open): 修复open系统调用在gvisor下的异常表现

I've completed a comprehensive review of the PR changes for functionality, performance, security, and Linux consistency. Here's my detailed feedback:

Current Progress:

  • [x] Create todo list
  • [x] Gather context: examine pre-fetched data and read changed files
  • [x] Understand the request: analyze the PR changes and review comments
  • [x] Review functionality: check if the implementation matches POSIX/Linux semantics
  • [x] Review performance: identify potential bottlenecks or inefficiencies
  • [x] Review security: check for security vulnerabilities
  • [x] Review Linux consistency: verify behavior aligns with Linux kernel
  • [x] Provide comprehensive review feedback

Summary

Overall, this PR makes significant improvements to open syscall semantics, fixing several test failures. The changes generally align with POSIX/Linux behavior, but there are some important issues that need to be addressed.

Detailed Review

Positive Changes

  1. O_APPEND semantics corrected (file.rs:630-635):

    • File offset now moves to EOF only during write operations, not at open time
    • lseek(fd, 0, SEEK_CUR) returns 0 after open with O_APPEND (POSIX compliant)
  2. O_TRUNC timing fixed (open.rs:279-284):

    • Truncation occurs at inode level before File object creation
    • Allows O_RDONLY | O_TRUNC with write permission to file (Linux behavior)
  3. Path validation improved:

    • Empty path check returns ENOENT (open.rs:179-181)
    • Trailing slash validation for non-directories returns ENOTDIR (open.rs:223-225)
  4. Directory flag validation (open.rs:239-253):

    • Proper checks for O_TRUNC, O_DIRECT, write access modes on directories
    • Returns appropriate error codes (EISDIR, EINVAL)
  5. Error code alignment (file.rs:768-804):

    • Changed from EPERM to EBADF for access mode validation (POSIX compliant)
    • Added invalid access mode checks (e.g., O_WRONLY | O_RDWR)
  6. Code optimization (commit bfe0389):

    • Removed redundant metadata retrieval in open.rs
    • Eliminated unused set_offset method (dead code)

⚠️ Critical Issues

1. Race Condition in O_APPEND Writes (file.rs:630-635)

let actual_offset =
    if flags.contains(FileFlags::O_APPEND) && matches!(file_type, FileType::File) {
        md.size as usize  // ← Race: file size read here
    } else {
        offset
    };
// ... later at line 669-670:
let written_len = self.inode.write_at(actual_offset, actual_len, buf, ...)?;

Problem: File size is read from metadata, but another process could write to the file before write_at() executes, causing:

  • Lost writes (data overwritten)
  • Data corruption

Fix Needed: Implement atomic append operation or file-level locking. The file size should be re-read immediately before write_at(), or use an atomic append operation if supported by the filesystem.

2. Offset Update Before Write Completion (Partially Fixed)

Commit bfe0389 fixed the immediate offset update in O_APPEND, but the general pattern still has issues:

// Current code (lines 672-681) - CORRECT:
if update_offset {
    if flags.contains(FileFlags::O_APPEND) {
        self.offset.store(
            actual_offset + written_len,  // ← Updated after successful write
            core::sync::atomic::Ordering::SeqCst,
        );
    } else {
        self.offset.fetch_add(written_len, ...);
    }
}

Status: This is now correct - offset only updates after successful write.

🔧 Code Quality Issues

3. Inconsistent Permission Checking

// readable() checks can_read()
if !mode.can_read() {  // line 781-783
    return Err(SystemError::EINVAL);
}

// writeable() does NOT check can_write()
// Should it check mode.can_write() for consistency?

4. Symlink Handling Ambiguity (open.rs:187-189)

let inode = inode_begin.lookup_follow_symlink2(&path, ..., follow_symlink);

Question: When follow_symlink=false and final component is a symlink, does lookup_follow_symlink2() return the symlink inode itself? If yes, the removed symlink check was unnecessary. If no, O_NOFOLLOW may not work correctly.

5. Chinese Comments in Blocklist

user/apps/tests/syscall/gvisor/blocklists/open_test:1: Uses Chinese characters. For consistency with the codebase, consider English:

# Block tests related to symlinks and permission checks, as they are not yet implemented

📊 Performance Considerations

  1. ✅ Redundant metadata calls fixed (was: inode.metadata() called twice in open path)
  2. ⚠️ O_APPEND race condition affects correctness more than performance
  3. Locking strategy: Consider if inode-level locking is needed for concurrent writes

🔒 Security Assessment

  1. Permission checks appear correct:

    • Non-O_PATH files: cred.inode_permission() with appropriate PermissionMask
    • O_TRUNC: checked at inode level with write permission
    • S_APPEND inode flag: prevents non-append writes
  2. Path validation adequate:

    • Empty paths rejected
    • Trailing slash validation
    • MAX_PATHLEN enforcement in check_and_clone_cstr()

🐧 Linux Consistency

Behavior DragonOS Linux Notes
`O_RDONLY O_TRUNC` ✅ Allowed with write perm ✅ Allowed with write perm
O_APPEND offset after open ✅ 0 ✅ 0 lseek(fd, 0, SEEK_CUR)
Directory with O_TRUNC EISDIR EISDIR
Directory with O_DIRECT EINVAL EINVAL
Empty path ENOENT ⚠️ Context-dependent AT_EMPTY_PATH special case
Trailing slash non-dir ENOTDIR ENOTDIR

Recommendations

High Priority

  1. Fix O_APPEND race condition (file.rs:630-635):
    • Implement atomic append or re-read size before write
    • Consider inode-level locking for concurrent writes

Medium Priority

  1. Verify O_NOFOLLOW behavior: Ensure lookup_follow_symlink2() returns symlink inode when follow_symlink=false
  2. Standardize permission checks: Consider adding can_write() check in writeable() for consistency

Low Priority

  1. Translate blocklist comment to English
  2. Add debug assertion for unreachable symlink check (as suggested by Copilot)

Conclusion

This PR significantly improves open syscall compliance with POSIX/Linux standards. The functional changes are mostly correct, but the O_APPEND race condition is a critical issue that must be fixed before merging. Addressing this will prevent data corruption in concurrent write scenarios.

The PR is mostly ready but requires the race condition fix for production readiness.


Review complete - All aspects (functionality, performance, security, Linux consistency) have been evaluated.