DragonOS icon indicating copy to clipboard operation
DragonOS copied to clipboard

[BUG REPORT] 程序启动时slabmalloc在分配内存时panic

Open BrahmaMantra opened this issue 1 year ago • 7 comments

描述错误 启动阶段的时候slabmalloc在allocate内存时报错(概率性错误)

请填写您的电脑的信息:

  • 操作系统及版本:Ubuntu 22.04
  • DragonOS版本:7c28051
  • DADK版本:dadk 0.1.11
  • Rust版本:rustc 1.84.0-nightly (fbab78289 2024-11-04)

重现步骤 make run启动内核

屏幕截图 图片

其他上下文 位于slab.rs:40 ps: 我不清楚这里逻辑有误后Box::leak(boxed_page)会不会导致#1044的问题,maybe not?(QWQ) 图片

系统日志 [ DEBUG ] (src/driver/base/device/bus.rs:300) bus 'virtio' add driver 'virtio_blk'

[41m[ ERROR ] [0m(src/lib.rs:115) Kernel Panic Occurred.

Location:

File: crates/rust-slabmalloc/src/sc.rs

Line: 113, Column: 9

Message:

assertion `left == right` failed: Inserted page is not aligned to page-size.

left: 2008

right: 0

Rust Panic Backtrace:

function:rust_begin_unwind() (+) 0518 address:0xffff800000247e66

function:core::panicking::panic_fmt() (+) 0027 address:0xffff80000036e27b

function:core::panicking::assert_failed_inner() (+) 0326 address:0xffff80000036e79e

function:core::panicking::assert_failed() (+) 0057 address:0xffff8000002ccec1

function:<slabmalloc::zone::ZoneAllocator as slabmalloc::Allocator>::refill() (+) 0771 address:0xffff8000002cf873

Current PCB:

ProcessControlBlock { pid: Pid(1), tgid: Pid(0), thread_pid: RwLock { lock: 0, data: UnsafeCell { .. } }, basic: RwLock { lock: 0, data: UnsafeCell { .. } }, preempt_count: 0, flags: LockFreeFlags { inner: KTHREAD }, worker_private: SpinLock { lock: false, data: UnsafeCell { .. } }, kernel_stack: RwLock { lock: 0, data: UnsafeCell { .. } }, syscall_stack: RwLock { lock: 0, data: UnsafeCell { .. } }, sched_info: ProcessSchedulerInfo { on_cpu: AtomicProcessorId { container: 0 }, inner_locked: RwLock { lock: 0, data: UnsafeCell { .. } }, sched_stat: RwLock { lock: 0, data: UnsafeCell { .. } }, sched_policy: RwLock { lock: 0, data: UnsafeCell { .. } }, sched_entity: FairSchedEntity { load: LoadWeight { weight: 0, inv_weight: 0 }, deadline: 5608661452, min_deadline: 4910421632, on_rq: Queued, exec_start: 2803056454, sum_exec_runtime: 2803056454, vruntime: 5609648730, vlag: 4910421632, slice: 750000, prev_sum_exec_runtime: 2453936534, avg: SchedulerAvg { last_update_time: 0, load_sum: 0, runnable_sum: 0, util_sum: 0, period_contrib: 0, load_avg: 0, runnable_avg: 0, util_avg: 0 }, parent: (Weak), depth: 0, self_ref: (Weak), cfs_rq: (Weak), my_cfs_rq: None, runnable_weight: 0, pcb: (Weak) }, on_rq: SpinLock { lock: false, data: UnsafeCell { .. } }, prio_data: RwLock { lock: 0, data: UnsafeCell { .. } } }, arch_info: SpinLock { lock: false, data: UnsafeCell { .. } }, sig_info: RwLock { lock: 0, data: UnsafeCell { .. } }, sig_struct: SpinLock { lock: false, data: UnsafeCell { .. } }, exit_signal: SIGCHLD, parent_pcb: RwLock { lock: 0, data: UnsafeCell { .. } }, real_parent_pcb: RwLock { lock: 0, data: UnsafeCell { .. } }, children: RwLock { lock: 0, data: UnsafeCell { .. } }, wait_queue: WaitQueue(SpinLock { lock: false, data: UnsafeCell { .. } }), thread: RwLock { lock: 0, data: UnsafeCell { .. } }, fs: SpinLock { lock: false, data: UnsafeCell { .. } }, alarm_timer: SpinLock { lock: false, data: UnsafeCell { .. } }, robust_list: RwLock { lock: 0, data: UnsafeCell { .. } }, nsproxy: RwLock { lock: 0, data: UnsafeCell { .. } }, cred: SpinLock { lock: false, data: UnsafeCell { .. } } }

BrahmaMantra avatar Nov 12 '24 06:11 BrahmaMantra

使用make qemu-nographic运行看看能否复现?

fslongjin avatar Nov 12 '24 08:11 fslongjin

使用make qemu-nographic运行看看能否复现?

试了二十几次,没有复现一切正常

BrahmaMantra avatar Nov 14 '24 11:11 BrahmaMantra

你确定你本地的版本是最新的吗?git log看看?

fslongjin avatar Nov 14 '24 11:11 fslongjin

图片 挺新的了,最近的commit好像没解决类似问题,但这个bug后面跑了很多次也没出来,不是很懂

BrahmaMantra avatar Nov 14 '24 14:11 BrahmaMantra

我也遇到这个问题,并且几次复现都跟pci有关,在pci初始化之后才会出现。是不是最近啥改动导致了内存越界? @1037827920 会不会是跟那个啥Arc相关的?(不过有一说一,那里之前就用了一些裸指针,感觉有点问题。) image

fslongjin avatar Nov 17 '24 15:11 fslongjin

刚开始pull最新主线后运行很容易触发(vnc and 无图形都会),基本上一次编译正常,再编译一次就出现这个bug。我尝试退到这个版本 #1009 ,并没有这样的问题。最后我切回主线之后发现已经触发不了这个bug了,有什么特定的触发条件吗?

1037827920 avatar Nov 18 '24 04:11 1037827920

刚开始pull最新主线后运行很容易触发(vnc and 无图形都会),基本上一次编译正常,再编译一次就出现这个bug。我尝试退到这个版本 #1009 ,并没有这样的问题。最后我切回主线之后发现已经触发不了这个bug了,有什么特定的触发条件吗?

我也是同样的。并且只要出现bug的时候,如果不重新编译,只是反复运行的话,能稳定复现。但是make clean之后重新编译,又复现不了了

fslongjin avatar Nov 18 '24 05:11 fslongjin