Saaketh Narayan
Saaketh Narayan
@ptrendx Oh I see. I think that since the model I have doesn't use bias terms, it would be nice to just specify `alpha`...but would I get the same result...
Hmm...this does seem like a bug on our side, although I'm not sure why this would be the case. Do you have a simple script that can reproduce this shard...
I see, that makes sense. It wouldn't impact current workflows anyways since an error is thrown regardless, but having `IndexError` would be more pythonic. If you're up for it, could...
Hey @abhijithneilabraham thanks for this issue! How would you propose finding the dataset size ahead of time? MDSWriter currently has no knowledge of how large your raw dataset files are...
@oceanusxiv Definitely agreed on all these points. While Lance integration isn't currently on our roadmap, if we have more interest from internal customers on this front, this would definitely be...
@josejg merge whenever you're ready
Hey, this seems like there's some stale shared memory. Just once, at the start of your training job, can you add a call to `streaming.base.util.clean_stale_shared_memory()` and see if that addresses...
@mvpatel2000 @angel-ruiz7 @dakinggg is this one basically done? or are there still concerns?
@naston Thanks for raising this. Which streaming version are you using? If you're using v0.9.1, can you try downgrading to v0.9.0 and see if this solves the issue? @XiaohanZhangCMU if...
Regarding: > Also, I noticed that you use local as the path to your dataset, that's not what streaming desires. @XiaohanZhangCMU @naston If the dataset is on local disk, you...