Dashiell Stander

Results 17 comments of Dashiell Stander

Potentially? The only complication (that occurs to me) might be that, in the repo I'm working on right now, it might be necessary to also copy over all the files...

I think this is a great idea. I don't know necessarily if this would work the best (in particular it might be somewhat extraneous to using models from replicate.com), but...

I'd love to give it a shot next week, unless there's something I'm forgetting about that's a higher priority @StellaAthena

I suspect that this is an error that has to do with model parallelism. @shaunstoltz how many GPUs were you loading the model onto / what was the model parallelism...

@0x6b64 have you looked into this any more? My first reaction is indeed that those are pretty small differences and I wonder if they may just come from non-deterministic pytorch...

I brought this up with @Quentin-Anthony and he was skeptical of my "non-deterministic ops" theory, and he'd know much better than I, so this is definitely a bit of a...

I tried a range of versions (including with a handful of easy changes to the code) and nothing worked right away. With an updated Triton version it probably wouldn't be...

@afeb-75 can you provide a more thorough stack trace for where the "Empty ds_version in checkpoint" error is coming from--I cannot reproduce it. When you say the "current version" of...

@kyleliang919 do you think that that is related to the issue described at the top of this thread by @afeb-75 ? I don't see the connection.

Hey @natek-1 , do you have any updates on this? It's totally alright if you haven't gotten a chance to look at it. Would it be alright if we assigned...