Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
Add skip iterations to `sample_idxs_to_text.py`
[WIP] Fixes: #189.
@stas00 Just pushed an MVP of the skip-based iteration index retrieval method. You can also look at the gist here for a simplified version. range
are the user-specified indices; skip
, the skip intervals; index
, the actual indices that have been retrieved based on the skips.
(venv) jaketae:test $ python skip.py --skip 2-5 2-7 10-11 11-18 22-23 --range 1 10
range [1, 10]
skips [[2, 7], [10, 18], [22, 23]]
index [1, 8, 9, 19, 20, 21, 24, 25, 26, 27]
Currently, the method takes linear time (one loop). I wonder if there is a more optimized algorithm that can retrieve the indices in logarithmic time, but I don't know if that's super important.
The next step would be to use the retrieved indices to dump the appropriate data points. I've modified the for
loop in the script where I believe the dumping is taking place, but I might have missed some details.
Thank you!
Very neat, @jaketae!
I think I/we didn't think it through - these are 2 totally different ranges. One is samples and the other is iterations. And they have no linear correlation since each iteration consumes a different number of samples depending on the BS rampup stage.
So unfortunately I'm not sure how we can tackle this directly. Perhaps we do have to log the actual skipped sample ranges in the training program and then we could feed those ranges to your code in this PR.
A much more complicated solution would be to make this PR's script to be aware of rampup and figure out the sample ranges from iteration ranges.
I'm not sure.
What do you think?
Oh, I see. I also completely missed that and assumed that we just need to retrieve the iteration indices. I guess it's a little more complicated than I had thought.
Hmm, I would obviously prefer a solution that's easier and quicker to implement. But at the same time, I don't want to change anything in the training script just for this sake. How complicated do you think it would be to take ramp up into account in this script instead of the training script?
Thinking more about it - given the transient nature of this project (I don't see Megatron-LM integrating our improvements), and that our attempts to study data at the points of blow up didn't bring any actionable insights, I'm sort of reluctant to invest much time into that direction.
So one approach is to do nothing and just park this PR until we may need to resurrect it. In which case my apologies since you have invested a lot of time into it and wrote really neat code, but hopefully it can be re-used elsewhere.
Or perhaps we will discover that actually we really want this to work. In which case we will probably have to work out the BS ramp up schedule and the convert iterations to samples - it shouldn't be very difficult, but my intuition that we might not even use this tool at the end.
But it's your call, if you'd like to finish it, you can. Or let it sit for a while.
Those are all fair points. I think we should table this PR for now and resurrect in the future when need be. This also does not seem like super high priority. I'm totally fine with my code not being used, as I thoroughly enjoyed the process.
I'll leave this PR open as I don't know if you keep a backlog of PRs of this sort, but feel free to close it!
Oh, there is no need to close it. We can just keep it if we want to resume it later.
Thank you for taking this outcome in a kind manner, Jaesung!