Ross Wightman
Ross Wightman
@lucidrains thinking about possible config designs, curious if you have a full example of what the pydantic based scheme would look like? Does it allow easy interaction with human readable...
@rom1504 for black, adding `--skip-string-normalization` is a little less opinionated and reduces diff quite a bit... But yeah, should probably focus on the major PR and some refactoring / design...
@rom1504 from this point, CLIP models could be supported better in the Hub UI by also adding a community inference pipeline To get them natively in Transformers, that's another step....
I think we've got two 'easy' options right now, DeepSpeed Zero (PR for this #264 might be worth testing) or PyTorch native FSDP. Talking w/ someone close to TPUs &...
@CloudRR yeah, something is wrong there but really hard to say what it is. BTW that 20% graph is also low, is that actually the one in the README for...
@mitchellnw heh, I was actually thinking about this an hour ago... * saving latest as done now is a bit error prone / slightly wasteful, it's done after the numbered...
@hetong007 no immediate plans to train such a model but a possibilkity. open to contributions but will close this for now
@mitchellnw that's interesting, I haven't observed that before in previous runs. I went back to check across some old resumes and even in the overlap (where there were logs before...
@mitchellnw coming back to this one, I don't feel the explanation makes sense, logit scale dips should have no correlation with the end of SCI in terms of dataset randomness....
@rom1504 args.checkpoint_path is constrained to the current name by default `args.checkpoint_path = os.path.join(args.logs, args.name, "checkpoints")` but the get_latest_checkpoint fn could be used to search across multiple folders if you passed...