python icon indicating copy to clipboard operation
python copied to clipboard

rna-transcription approach: a few improvements

Open petrem opened this issue 2 months ago • 13 comments

  • Renamed chr to char in the snippets because chr is a built-in function and although shadowing it in this case may not be a problem, it is still a bad practice when it can be avoided.
  • The dictionary-join approach mentions list comprehensions, but instead it uses a generator expression. Replaced this in the explanation and expanded to give the list comprehension based implementation along with a brief comparison.
  • The overview mentions one approach is four times faster. In a brief comparison, it varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one. Probably not worth going into the details, but 4x is just innacurate.

I noticed that the maketrans based approach mentions the translation is done between ASCII ordinal values. This happens to be true because the characters involved are ASCII letters, but the functions deal with unicode ordinals.

Do you think this is worth mentioning?

petrem avatar Dec 21 '25 20:12 petrem

Hi there @petrem! Thanks so much for this. I haven't taken a close look yet...but wanted to reply to your message/questions first.

Renamed chr to char in the snippets because chr is a built-in function and although shadowing it in this case may not be a problem, it is still a bad practice when it can be avoided.

Good call. Shadowing is just something to avoid, generally. Unless of course you program in a language where its SOP for some reason. I would probably go even further and just name it character - but I'm fine with char too.

The dictionary-join approach mentions list comprehensions, but instead it uses a generator expression. Replaced this in the explanation and expanded to give the list comprehension based implementation along with a brief comparison.

I might go even further here and talk a little about how generator expressions avoid excess memory by not creating values in a list in memory and instead yielding values at runtime. The downside being that once consumed, a generator cannot be "re-run". They can also (paradoxically seeming) be slower, depending on the data and the expression.

However, list comprehensions (or any concrete comprehension) do create their values in memory, which might quickly become a no-op when processing large amounts of data.

For the size of data we're talking about in this exercise, its not terribly important - but worth discussing.

The overview mentions one approach is four times faster. In a brief comparison, it varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one. Probably not worth going into the details, but 4x is just inaccurate.

I don't know if you took a look at the performance article/benchmarks, but the info likely comes from there. And as you are probably well aware, accurate benchmarking is complicated and also highly dependent on hardware. The best you can do is be really honest about how you crafted the benchmark and what hardware was used. I'd love if someone did a quick once-over of the performance stuff .. and if it is just horrid, we should probably remove the mention of 4x or do the range you pointed out. But yeah, I was not really a fan of the perf stuff from this author, but was sort of in a position where I had to approve the PR.

I noticed that the maketrans based approach mentions the translation is done between ASCII ordinal values. This happens to be true because the characters involved are ASCII letters, but the functions deal with unicode ordinals.

Do you think this is worth mentioning?

Yes. If you have the energy and interest, it is well worth mentioning. Python claims to be UTF-8 first and full able to process Unicode. So students should be aware of how that happens.

BethanyG avatar Dec 21 '25 21:12 BethanyG

Just took a deeper look, and I don't have any nits at the moment beyond the one found by @BNAndras. 😄 But will wait to merge in case you have additions after my comments above. 🎉

BethanyG avatar Dec 21 '25 21:12 BethanyG

Thank you. I'll make some updates tomorrow (ASCII/unicode and perhaps take another look at the list comprehension vs generator expression).

would probably go even further and just name it character - but I'm fine with char too.

I was actually tempted to name it nucleotide, I wonder if you (plural) think that would be a good idea, too?

petrem avatar Dec 21 '25 22:12 petrem

I was actually tempted to name it nucleotide, I wonder if you (plural) think that would be a good idea, too?

I like that. 😄

BethanyG avatar Dec 21 '25 22:12 BethanyG

would probably go even further and just name it character - but I'm fine with char too.

I was actually tempted to name it nucleotide, I wonder if you (plural) think that would be a good idea, too?

Exactly, using language at the problem being solved (domain language) is definitely good for showing intent in the reading of the code.

kotp avatar Dec 21 '25 22:12 kotp

Oh boy. I messed up my branches. Sorry about that :-( I'm afraid we should discard this PR and create clean new one.

petrem avatar Dec 22 '25 22:12 petrem

Oh Nos! My empathy for you: I often screw up branches, so I know what that feels like! ~~Go ahead and open a new PR, and then link to this one so we don't lose the discussion. Ping me here, on Discord or on the forum when you are ready for review.~~

See my updated comment below.....I think I managed to fix this!

BethanyG avatar Dec 23 '25 00:12 BethanyG

Closing this per @petrem note above. Can you also please remove the codeowners/admin review? That is going to ping Jermey and Erik, and that's not really necessary here (unless you'd like them to do some sort of rollback or other admin-only action).

BethanyG avatar Dec 23 '25 00:12 BethanyG

Reopening to see if I can fix this.

BethanyG avatar Dec 23 '25 00:12 BethanyG

Alright @petrem - I've managed to drop all the other commits you picked up, and so now there are only 4 changed files. The bad news? I had to do a merge commit for the rebase, so now we have that long horrid list of commits in the history - along with you being on them as co-author. But I think we're out of the woods (and I think if we squash on merge we'll get that long list suppressed), should you want to continue with this PR rather than opening a new one. 😄

BethanyG avatar Dec 23 '25 04:12 BethanyG

Sorry to have put you trough the trouble. I don't even know how I got here which makes it worse. Usually I simply use git cli but I don't find anything suspicious in the history. Must've done something really stupid in the editor <:-|

It's still a mess (that I've created) so I'd prefer to start afresh, if that's OK? Or perhaps there's something I'm missing and it's better to continue here, in that case, that's also fine.

With my last commit I think I've implemented all changes/suggestions. I've also copied a note over from one approach to the other. This would apply to any approach, so maybe it should be moved to the introduction?

petrem avatar Dec 23 '25 08:12 petrem

Hi @petrem - apologies for not replying earlier. I am in PST, and went to bed on the early side last night. 🙂

It's still a mess (that I've created) so I'd prefer to start afresh, if that's OK? Or perhaps there's something I'm missing and it's better to continue here, in that case, that's also fine.

If you are more comfortable restarting, let's do that. I think the only issue is that this PR has quite a bit of discussion on it, so we lose that direct link. But as long as you reference & link to this PR in your new one, I think we'll be fine. 😄

I don't even know how I got here which makes it worse. Usually I simply use git cli but I don't find anything suspicious in the history. Must've done something really stupid in the editor <:-|

Having just played with the gh cli last night (it was the only way to edit the PR branch without creating a new one), I think what happed was a gh pr update-branch that was done directly from the repo (not your fork), and was missing the --rebase flag. So it pulled in the last year's worth of changes on main, and merged them to your branch as "new" commits. But I don't know for sure. 😱

My typical workflow for anything I do on the track is:

TL;DR? Always pull in all upstream changes before branching to do work. 😄

  1. Switch to my local main, and do a git fetch --all to pull down all new refs. I have remotes set for the repo (upstream) and my fork (origin).
  2. Once the refs download, I do a git rebase upstream/main, which replays my local main onto the fetched one. (e.g. it updates my main with all pulled in changes)
  3. After rebase, I check the status, and if needed, I push my local main changes to my fork (origin).
  4. Once main has been updated, I checkout and update the branch via git rebase main - although typically, I am doing the update before starting work, so instead of rebasing a branch I am creating one via git checkout -b <branch name>.
  5. Once done with branch work, I do a git push, which updates my fork (origin). From there, GH flags the change and the web interface asks if I want to PR to upstream. If I've already PR'd, the upstream branch is updated automatically.

All that doesn't guarantee I won't screw something up royally (I have a talent). And if I do mess up, oh shit, git is usually my very first stop, followed by increasingly hysterical googling. 😆

So - no harm, no foul! I appreciate all the work and am happy to have you re-do the PR. 💙

With my last commit I think I've implemented all changes/suggestions. I've also copied a note over from one approach to the other. This would apply to any approach, so maybe it should be moved to the introduction?

Yes - I'd move the note to the introduction if it applies generally. 😄 And YAY! For finishing things up. I look forward to your PR. Hope all the shenanigans didn't scare you off!

BethanyG avatar Dec 23 '25 17:12 BethanyG

Hi @BethanyG ,

To quote you, TL;DR -- I moved the note to introduction.md. If you can get rid of the commits during squash, and can do that without wasting hours, I'm fine. If not, I'll start a new branch and I can attempt to cherry-pick so that I preserve the authors/co-authors. Please let me know your preference. Thank you for your patience.

apologies for not replying earlier. I am in PST, and went to bed on the early side last night. 🙂

No need to apologise. I'm in EET (as in UTC+2), so we'll inevitably have delays.

If you are more comfortable restarting, let's do that. I think the only issue is that this PR has quite a bit of discussion on it, so we lose that direct link. But as long as you reference & link to this PR in your new one, I think we'll be fine. 😄

Oh yes. Plus the commits directly made from suggestions from all the helpful people. If you can clean it up on squash without messing up the history, and without losing hours, I'm fine. I just don't like leaving a mess after me.

Having just played with the gh cli last night (it was the only way to edit the PR branch without creating a new one), I think what happed was a gh pr update-branch that was done directly from the repo (not your fork), and was missing the --rebase flag. So it pulled in the last year's worth of changes on main, and merged them to your branch as "new" commits. But I don't know for sure. 😱

I'm not familiar with gh cli, and not too familiar with fork workflows either. So I appreciate your input.

My typical workflow for anything I do on the track is: TL;DR? Always pull in all upstream changes before branching to do work. 😄 [...]

Thanks for sharing that! I do a variation of that so at least that's reassuring. There are some differences that I will look into.

I did update my main, but as I work off a branch rather than directly on main, I thought it'll be safe. And I guess then I somehow managed to pull/rebase/merge my origin/main on the local branch.

With my last commit I think I've implemented all changes/suggestions. I've also copied a note over from one approach to the other. This would apply to any approach, so maybe it should be moved to the introduction? Yes - I'd move the note to the introduction if it applies generally. 😄

~Will do.~ Done.

And YAY! For finishing things up. I look forward to your PR. Hope all the shenanigans didn't scare you off!

I'll be fine, I've done much worse. Let me just say clinical data and production database (all was well in the end, patients safe, just aged an year or two in the hour that I fixed it :laughing: ).

petrem avatar Dec 23 '25 18:12 petrem

Alrighty. Pushing that big, green button. Cross your fingers and your toes!.

BethanyG avatar Dec 24 '25 19:12 BethanyG

Sigh. All those pulled in commits are now showing up again when I try to merge. Going to attempt to fix it again -- but we may need to use a new PR after all. 😢

BethanyG avatar Dec 24 '25 20:12 BethanyG

I can do a new branch and PR, of course, @BethanyG !

petrem avatar Dec 24 '25 22:12 petrem

I can do a new branch and PR, of course, @BethanyG !

With a little more help from @IsaacG and @kotp, we've got it fixed. So I am going to merge and RUN 😄 But let's never, EVER do this again. I learned a lot of great stuff, but am now loathe to practice it more. Let's hope none of us ever get in this particular pickle again! 😆

BethanyG avatar Dec 24 '25 22:12 BethanyG

Thank you all.

petrem avatar Dec 25 '25 09:12 petrem