Ruoxi
Ruoxi
Need to add several `boundary_check` and `padding_option`s to `tl.load` and `tl.store` instructions, also need to apply a mask for the padding part.
It is halfway running the benchmark. I use htop (like windows task manager) to monitor the processes and threads. The first row is the process and the second row is...
My guess is thread.TrySetPriority(ThreadPriority.Highest, parameters.Logger); fails as linux prevents a thread to be more prioritized than the process, and the enums of thread.Priority don't map exactly to the NICE levels...
https://github.com/dotnet/runtime/blob/v9.0.2/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Thread.NativeAot.Unix.cs Priority = ... seems to be a no-op on nativeaot on linux. I'm not sure how it happens now.
I am aware of this behavior; but it is still weird that the thread's priority becomes lower than normal priority.
The training code is very specialized towards our infrastructure, and thus we do not have a plan to release a complete training script for now. But every module is there...
Though not official, you can try the finetune code from https://github.com/TencentARC/InstantMesh
It is killed due to out of memory (kmem consumes 184GB and resident set is 16GB at the point of crashing). I don't know why. Limiting max-depth to 4 there...
I think `total` is number of bytes in the scanned files? It only scanned 63M files when it crashes
It helped a bit as it managed to scan 351M entries before being killed compared to