python-novice-inflammation icon indicating copy to clipboard operation
python-novice-inflammation copied to clipboard

Use numpy.concatenate instead of hstack, vstack

Open chillenzer opened this issue 1 year ago • 5 comments

Hi everybody, I was just reading through Episode 2 and was surprised about the appearance of numpy.hstack and numpy.vstack. Isn't it more useful to just introduce numpy.concatenate with an appropriate axis kwarg? I personally never use the {h,v}stack functions because they lack the generality to handle some cases for higher dimensions (and whenever I did it took me a while to sort out which of the 3, 4 or 5 axes of my array is considered "horizontal"). Even if the tutorial (at least at that point) is only concerned with 2D data, would it hurt to give them the exact same functionality but sneaking in the generality they might need for their own use case? One could even argue that it is simpler

  • to have only one function name to remember.
  • not to rely on them having the correct geometrical picture in mind when they could have an unambiguously enumerated axis instead.

Admittedly, this might be just personal preference (my own as well as of the people I work with), so I would be interested to hear if there are some rational arguments for the current way of doing it. If not, I'm happy to provide this small patch myself. Best, Julian

chillenzer avatar Sep 15 '22 10:09 chillenzer

It's an interesting point! I think numpy.hstack() and numpy.vstack() would help those who haven't grasped the idea of dimensions and axis yet. For example, putting a lego block on top on another is easier to think about than which dimension is which.

But a note on numpy.concatenate() would be useful for the stronger students

shermanlo77 avatar Dec 19 '22 12:12 shermanlo77

Okay, I see your point. When teaching this material, we had some discussions with the learners about how the data is laid out and what the provided axis actually means. This is particularly tricky in 2D arrays where there is a coincidental symmetry between the axis arguments and their complement; e.g. when np.sum(..., axis=0) reduces a 5d array to a 4d array it is pretty clear that the sum was taken along axis 0 while the same for 2d to 1d could either mean "along axis 0" or "only axis 0 is kept" which is often symmetrical in shape in such situations.

I guess one could argue that the provided representation when printing gives a reasonable intuition for 2D arrays, still this does not straightforwardly generalize to higher dimensions (at least in my head).

But I think the compromise you suggest could be okay. Shall I write up a short info box on this?

chillenzer avatar Dec 19 '22 12:12 chillenzer

That's a good point, the episode goes on to explain the difference between numpy.mean(data, axis=0) and numpy.mean(data, axis=1) so the students should know about dimensions and axis

I think your/our suggestion on using concatenate() could be a good candidate for a pull request

shermanlo77 avatar Dec 19 '22 13:12 shermanlo77

Great! I will write something up and create a PR. Doesn't have high priority though, so might take a while to arrive. =)

chillenzer avatar Dec 19 '22 14:12 chillenzer

In fact, np.concatenate could make the whole axis thing even clearer than np.mean because one can immediately follow the change in shape as opposed to manually inspecting the data which at that point is definitely not 100% obvious to compare.

chillenzer avatar Dec 19 '22 14:12 chillenzer