Saaketh Narayan comments

Results 82 comments of


                                            Saaketh Narayan

Output scale not being used with `te_gemm` in FP8

@ptrendx Oh I see. I think that since the model I have doesn't use bias terms, it would be nice to just specify `alpha`...but would I get the same result...

Last entry in the dataset is causing "Relative sample index $x is not present" error

Hmm...this does seem like a bug on our side, although I'm not sure why this would be the case. Do you have a simple script that can reproduce this shard...

Use IndexError instead of ValueError in getitem

I see, that makes sense. It wouldn't impact current workflows anyways since an error is thrown regardless, but having `IndexError` would be more pythonic. If you're up for it, could...

Estimate total shards at the beginning of data conversion

Hey @abhijithneilabraham thanks for this issue! How would you propose finding the dataset size ahead of time? MDSWriter currently has no knowledge of how large your raw dataset files are...

Support for Lance

@oceanusxiv Definitely agreed on all these points. While Lance integration isn't currently on our roadmap, if we have more interest from internal customers on this front, this would definitely be...

'File exists: "/00000_locals"' when integrated with deepspeed training scripts

Hey, this seems like there's some stale shared memory. Just once, at the start of your training job, can you add a call to `streaming.base.util.clean_stale_shared_memory()` and see if that addresses...

Add analytics logging to `MosaicMLLogger`

@mvpatel2000 @angel-ruiz7 @dakinggg is this one basically done? or are there still concerns?

Cannot Load MDS Dataset

@naston Thanks for raising this. Which streaming version are you using? If you're using v0.9.1, can you try downgrading to v0.9.0 and see if this solves the issue? @XiaohanZhangCMU if...

Cannot Load MDS Dataset

Regarding: > Also, I noticed that you use local as the path to your dataset, that's not what streaming desires. @XiaohanZhangCMU @naston If the dataset is on local disk, you...