p5.js icon indicating copy to clipboard operation
p5.js copied to clipboard

[2.0] Stabilize the undocumented `dimensions` field of `p5.Vector`

Open GregStanton opened this issue 2 months ago • 16 comments

[2.0] Stabilize the undocumented dimensions field of p5.Vector

This issue involves two parts: a bug fix and a new API. I plan to cleanly separate these parts into actionable sub-issues.

Sub-issue 1: Bug fix

Problem

The dimensions field isn't currently documented, but it appears to users when they use the command console.log(myVector), and it may sometimes have inaccurate values, as observed here and here. Thanks to @sidwellr for the following summary:

The gist of the comments linked above is that dimensions does not always reflect the true dimensions of the vector. For example, createVector() (with no params) creates a 3D vector [0, 0, 0] but sets dimensions to 2. And p5.Vector.sub(createVector(1, 2), createVector(1, 2, 3)) will return a 3D vector with dimensions set to 2.

Solution

These bugs should be readily fixable by implementing the property with the JS get syntax, and having the getter retrieve the current, correct value whenever a user accesses it.

Sub-issue 2: A strong API, refined through community discussion

The original API proposal has been withdrawn and can be viewed in the details section further below, for continuity. It's been replaced with a stronger proposal that incorporates community feedback. The new .shape/.rank API would replace dimensions, and would work as follows:

Data structure Proposed read-only property Example value Data type
Vector [x, y, z] .shape [3] Array
Vector [x, y, z] .rank 1 Number
Matrix (3 rows, 4 cols) .shape [3, 4] Array
Matrix (3 rows, 4 cols) .rank 2 Number

Benefits

  • Predictability: Enjoys strong precedents in both math and creative coding
    • This API is used by TensorFlow.js. The original motivation for $n$-dimensional vectors was to provide an onramp to exactly this kind of library.
    • Using an array for the shape of a vector also enjoys a strong precedent in creative coding: babylon.js, a major peer library backed by Microsoft. It uses [3] for the shape of a vector [x, y, z], rather than 3.
  • Consistency and extensibility: Offers type consistency and extends to other classes
    • Using an array for .shape (e.g., [3] for a vector) ensures the property always returns the same data type (Array), unlike a property that might return a Number for vectors and an Array for matrices.
    • This API extends naturally to matrices, tensors, and even scalars (where .shape would be [] and .rank would be 0).
  • Readability: Replace ambiguity with clarity
    • This API eliminates the confusion around the overloaded term "dimension." A vector like [2, 3, 5] might be called "3-dimensional," but as a data structure, it's 1-dimensional (like a list), while a matrix is 2-dimensional.
    • Replacing dimensions with a .shape of [3] and a .rank of 1 provides standard, unambiguous terms for these distinct concepts.
Original, withdrawn renaming proposal for `dimensions` ~~This would likely be a useful feature, with precedent in other libraries, e.g. [babylon.js has `dimension`](https://doc.babylonjs.com/typedoc/interfaces/BABYLON.Tensor#dimension). But, prior to release, it may be renamed to `dimensionSize`, which works better in the context of p5. This name would be clearer, and it'd allow a consistent interface across vectors, matrices, and potentially tensors:~~
~~Data structure~~ ~~Proposed property~~ ~~Example value~~ ~~Data type~~
~~Vector [x, y, z]~~ ~~.dimensionSize~~ ~~3~~ ~~Number~~
~~Matrix (3 rows, 4 cols)~~ ~~.dimensionSizes~~ ~~[3, 4]~~ ~~Array~~
~~Matrix (3 rows, 4 cols)~~ ~~.dimensionCount~~ ~~2~~ ~~Number~~

~~This naming scheme prevents users from falling into type traps and ambiguity traps, as described in this comment.~~

Previous note regarding settable properties (this point has been settled)

Writability: Also, there would need to be discussion about whether to (a) protect this field from modification or (b) support modification, e.g. by padding a vector with zeros or truncating it if needed. Right now, as @sidwellr noted, "users can currently set dimensions to any value, including nonsensical ones like 2.5 and 'frog'."

Update: The shape and rank are derived properties, based on the underlying data. Directly modifying the shape doesn't allow users to specify their intent (e.g. if the new shape is bigger, how should it be filled?). For vectors and matrices, the rank is a property of the class as a whole, so it's not something that should be modified. The standard, user-friendly approach is to provide explicit methods for reshaping and resizing that allow users to clearly specify their intent. The newly proposed .shape/.rank API adheres to this pattern by making these read-only properties.

Tasks

  • [ ] Fix bugs with the JS get syntax
  • [ ] Reach consensus on the rename
  • [ ] Implement the rename, if accepted by the community

Edits:

  1. Added points from @sidwellr.
  2. Added table to illustrate naming scheme, and clarified its benefits.
  3. Added task list.
  4. Reorganized post to reflect community feedback.

GregStanton avatar Oct 15 '25 14:10 GregStanton

The gist of the comments linked above is that dimensions does not always reflect the true dimensions of the vector. For example, createVector() (with no params) creates a 3D vector [0, 0, 0] but sets dimensions to 2. And p5.Vector.sub(createVector(1, 2), createVector(1, 2, 3)) will return a 3D vector with dimensions set to 2.

In addition, users can currently set dimensions to any value, including nonsensical ones like 2.5 and "frog".

None of these affect operation of the vector; dimensions is not used internally.

sidwellr avatar Oct 15 '25 23:10 sidwellr

I personally prefer the term dimensions to dimensionSize. It's intuitive: a three-dimensional vector would have dimensions 3 for example. It would also work for matrices and tensors, but the result would be a list instead of a number.

sidwellr avatar Oct 16 '25 00:10 sidwellr

Thanks, @sidwellr!

I personally prefer the term dimensions to dimensionSize. It's intuitive: a three-dimensional vector would have dimensions 3 for example. It would also work for matrices and tensors, but the result would be a list instead of a number.

Thanks for sharing your feedback, and that's a great point about intuitiveness. For a single vector, dimensions feels natural and intuitive. The challenge appears when we extend this to a matrix.

Type trap

This creates a "type trap" that could be confusing for users. For example, a user might write code that works perfectly for a vector called myData: if (myData.dimensions > 2) { ... } But if this is a generic function that operates on vectors and matrices, the same logic will silently fail when myData is a matrix: in JavaScript, [3, 4] > 2 evaluates to false in a weird way.

This is the kind of subtle bug that can be frustrating for beginners and experts alike. Your comment totally clarified the nature of this type trap for me. Having a clear, unambiguous naming system can help users avoid these traps.

Ambiguity trap

We also need to distinguish between two different concepts:

  • How many numbers define the size of each axis? (Your "list" for a matrix)
  • How many axes are there in total? (e.g., 1 for a vector, 2 for a matrix)

Solution

To prevent users from falling into those traps, we could use the naming scheme I proposed, which gives a distinct name to each concept:

Data structure Proposed property Example value Data type
Vector [x, y, z] .dimensionSize 3 Number
Matrix (3 rows, 4 cols) .dimensionSizes [3, 4] Array
Matrix (3 rows, 4 cols) .dimensionCount 2 Number

This system ensures that the names are always clear and the return types are always consistent, respecting the principle of least surprise. It completely resolves the ambiguity of the term dimensions by providing distinct names for distinct concepts: the size of the dimensions (.dimensionSizes) and the number of dimensions (.dimensionCount).

Extending this to Tensors (Advanced Example)

This pattern scales perfectly. For a tensor representing a 2x3 pixel image with 4 color channels (a 2x3x4 tensor), the properties would be:

.dimensionSizes: [2, 3, 4] (an array of the size of each dimension)

.dimensionCount: 3 (the number of dimensions, often called "rank")

How does this reasoning sound to you? Thanks again for your comment. It helped me clarify the core issue of return type (Number vs. Array), which is a really important detail to get right.

GregStanton avatar Oct 16 '25 01:10 GregStanton

Also, thanks @sidwellr for your summary of current issues with the behavior of dimensions. That was really helpful. I've incorporated your comments into this issue's top post.

GregStanton avatar Oct 16 '25 02:10 GregStanton

I feel like having those three properties that all sound similar might be more confusing just because there's more to remember, and all sound very similar to each other. Are there more distinct names we could use? As another idea, I wonder if rather than having three different properties, we can take inspiration from the numpy shape? That always returns an array, so a 3d vector would be [3], a 2D vector would be [2], a 3x4 matrix would be [3, 4], and the three properties we currently have can be fairly easily derived from that. The main downside there is that the shape of a vector is [3] instead of 3, but if we're expecting this to mostly be looked at as a step in debugging, that could be fine?

davepagurek avatar Oct 16 '25 17:10 davepagurek

Thanks @davepagurek! This is a very productive conversation. I think your ideas have led to some possible improvements.

I can see how "size" vs. "count" might appear too similar. That's a good point. One way to rectify that is to replace dimensionCount with the usual rank.

Downsides of rank are that beginners will be less familiar with the term, and it conflicts with the meaning of "rank" that students learn in linear algebra. However, it's a standard term and is used in libraries like TensorFlow.js. Since one of the goals is to provide an onramp to those kinds of libraries, I think using rank has advantages. Also, there are ways around the naming conflict with linear algebra. Just as we will likely distinguish e.g. multiply() from matrixMultiply(), we could distinguish .rank from .matrixRank.

Here are a couple more variations on your idea to reduce the number of similar terms.

Data structure Proposed property Example value Data type
Vector [x, y, z] .dimension 3 Number
Matrix (3 rows, 4 cols) .dimensions [3, 4] Array
Matrix (3 rows, 4 cols) .rank 2 Number

I think "dimension" is kind of a fraught term, though, the more I think about it. The issue is that there are two interpretations: the size of each data axis (3 for a 3D vector, 3x4 for a matrix with 3 rows and 3 columns) or the number of axes (1 for any vector, 2 for any matrix). My original API was an attempt to address this in a kind of subtle way, but perhaps there are better options, like maybe the following:

Data structure Proposed property Example value Data type
Vector [x, y, z] .shape [3] Array
Matrix (3 rows, 4 cols) .shape [3, 4] Array
Matrix (3 rows, 4 cols) .rank 2 Number

From what I've seen, shape is a common name for what I called dimensionSizes. It's pretty far from universal, and there's perhaps a small chance of confusion with beginShape()/endShape(). My original intuition was that vector.shape seems a bit unclear by itself, at least for people unfamiliar with this naming convention (which will likely include most p5 users). The other downside, as you noted, is that a 3D vector would have a shape of [3] rather than 3. However, the fact that shape is common may negate the other concerns, which seem relatively minor.

We could perhaps explain in the docs how using an array, as in [3], allows us to extend the notion of shape to matrices. There's also a precedent in babylon.js, a major peer library backed by Microsoft, which uses [3] rather than 3. Overall, this type consistency may be a feature rather than a bug. It creates a unified, common API that matches the TensorFlow.js API, which provides a strong precedent. Using such a precedent reduces the likelihood of unforeseen API problems down the road and aligns with the goal of providing an onramp to advanced libraries. Another potential advantage is that it allows us to support scalars within the same framework, as these have a shape equal to [] and a rank equal to 0.

It's also been proposed by @sidwellr that we consider getter methods. These might have names like getShape() and getRank(). I'm doing an inventory of getter/setter methods across the full p5 API to get a better sense of what pattern we may want to move to and will post the results in #8152.

Thoughts?

I can keep working on this, but do you have any initial thoughts about those options?

Edit: Fixed typo in first row and third column of the last table, and explicitly noted the similarity of shape to beginShape()/endShape(). Also added the variation of shape and rank that uses getter methods. Thanks to @sidwellr for pointing these issues out. Added brief clarification about how we might document shape for vectors, and a possible advantage regarding the representation of scalars. Added the babylon.js precedent for using an array to represent the shape of a vector.

GregStanton avatar Oct 16 '25 19:10 GregStanton

@GregStanton I think in your last table that you meant to change the Data type for Vector .shape to Array.

Would .shape be confusing for beginners since p5 also supports "shapes" using beginShape() and endShape()? I think the context is different enough that this isn't a problem, but wanted to make sure it was considered.

Whatever we end up calling it, I think it should be accessed with a getter function rather than being a field. That would ensure it is always correct, and eliminate the need to set it whenever the dimension might change. Having it be an array [3] rather than a number 3 makes sense in the context of potential future Matrix and Tensor classes, but it seems strange right now since those don't exist (yet).

I do think there needs to be some way to change a vector's dimension. One way is to make a matching setter function that validates the parameter and pads with zeros or truncates the vector. A more flexible way was mentioned by @davepagurek in issue #8159: convenience methods to convert between dimensions that could be used inline to match dimensions when performing various operations. These could allow specifying what to pad with or which elements to remove. A twizzle capability may also be useful. If these existed, we could make the dimensions parameter (whatever it is called) read-only, though it also makes sense to keep it as a simple and intuitive way to change the dimension. Perhaps another sub-issue would be helpful to discuss this; the proposed convenience methods are lost in the larger context of #8159, and they would be useful regardless of that issue's outcome.

sidwellr avatar Oct 20 '25 20:10 sidwellr

@GregStanton I think in your last table that you meant to change the Data type for Vector .shape to Array.

Yep! Fixed it, thanks.

Would .shape be confusing for beginners since p5 also supports "shapes" using beginShape() and endShape()? I think the context is different enough that this isn't a problem, but wanted to make sure it was considered.

Great observation. This is just the kind of thing we need to be considering. I originally was going to go with shape, but I thought there might be a minor chance of confusion, for the reason you cited. I thought I had found a different API that sidestepped this issue (dimensionSize/dimensionSizes/dimensionCount). However, it's been observed that the similarity of "size" and "count" is potentially a greater source of confusion. Without such qualifiers, "dimension" is confusing, as it has dual meanings, as noted elsewhere in this discussion. My sense is that the potential for confusion with beginShape()/endShape() is much smaller, so I'm now leaning toward the standard .shape/.rank API. But alternatives are still welcome!

Whatever we end up calling it, I think it should be accessed with a getter function rather than being a field. That would ensure it is always correct, and eliminate the need to set it whenever the dimension might change.

I appreciate you bringing this up! It's a great catch. I'm currently doing an inventory of getter/setter APIs across the whole of p5, in order to help us to determine the best long-term pattern to move toward. Predictability of the API is key. I'll post the results in #8152.

In terms of protecting and updating data, both explicit getter methods and read-only fields are still possible. For example, if we implement .shape and .rank fields as getters internally, with JavaScript get syntax and no corresponding setter, then we can create read-only properties that retrieve their values dynamically. This ensures the value cannot be mutated by the user accidentally and that the returned value is always correct. Whether we opt for that or e.g. getShape() and getRank() depends on the overall API pattern we want to adhere to. An advantage of explicit getters and setters is that their behavior is clearer from their names, which is generally good.

On the other hand, user-facing getter methods for these properties seem atypical (e.g. in TensorFlow.js and in babylon.js, read-only fields are used), and non-standard APIs have their own costs.

Having it be an array [3] rather than a number 3 makes sense in the context of potential future Matrix and Tensor classes, but it seems strange right now since those don't exist (yet).

Right. This was my original reasoning for introducing singular/plural versions like dimensionSize/dimensionSizes. If you look at how TensorFlow.js describes its shape and rank properties, it says that rank "defines how many dimensions the tensor contains," and shape "defines the size of each dimension of the data." Those descriptions map exactly to my dimensionCount and dimensionSize API. But others have suggested that API may have too many similar terms. It's also worth noting that immediately after explaining rank and shape, the TensorFlow docs explain how "dimension" can mean different things, so I agree that it can be a bit confusing.

The good news is that there's at least a precedent in babylon.js, where vectors have a dimension property that's [3] for a Vector3. I'd definitely avoid using "dimension" though, as their meaning is precisely opposite the meaning used by TensorFlow.js (TensorFlow.js uses "dimension" to refer to the number of data axes, i.e. the rank, and babylon.js uses the same term to refer to the sizes of each axis, i.e. the shape). I think we could mitigate any confusion with documentation that explains the shape concept for vectors and matrices, even if we don't have matrices yet.

I do think there needs to be some way to change a vector's dimension... Perhaps another sub-issue would be helpful to discuss this...

Great idea.

Summary

  1. In my view, the .shape and .rank API might still be the best of the options proposed so far, although there are trade-offs. To help reach consensus on whether to use fields or explicit getter methods, I'm developing an inventory of current getter/setter APIs across p5 and will share the results in #8152.
  2. I'll create a new sub-issue of the main umbrella issue #8149 to deal with dimension mutation. This can start out with a few sub-issues. One will be dedicated to intentional dimension mutation via methods like pad() and crop(). Dedicated methods like these may be the best way to mutate the dimension, rather than doing so through a shape API or similar (dedicated methods would clarify how to fill an expanded vector, for example). A couple other sub-issues will relate to specific, side-effect mutations I've observed that are likely unintentional.

GregStanton avatar Oct 22 '25 12:10 GregStanton

Quick update: A strong API, refined through community discussion

I've revised the API proposal in the top post of the current issue. It now proposes a strong, precedent-backed API, refined through community discussion. Please see the top post for this much improved version, along with a clear list of its benefits.

GregStanton avatar Oct 29 '25 03:10 GregStanton

hi @GregStanton

I support the .shape API proposal. It simplifies dimension checks and makes implementations like rotate() cleaner.

Ayaan005-sudo avatar Oct 29 '25 13:10 Ayaan005-sudo

Read-only properties .shape and .rank sound good to me. Thanks to all for the discussion.

sidwellr avatar Oct 29 '25 15:10 sidwellr

Hi — I'm Shubham Kahar and I support this proposal.
It improves consistency and clarity for vector users, and I think it will simplify both API usage and documentation.
Happy to help test or document once the final decision is made. 🙂

Shubhamkahar196 avatar Nov 12 '25 16:11 Shubhamkahar196

Hi everyone, thanks so much for all the lively discussion of the p5.js 2.x Vector implementation! Now that that 2.1 is released, we wanted to set up a more direct discussion space for p5.js 2.x Vector implementation bugfixes, documentation, and improvements. So, here is a Discord channel: https://discord.gg/gH3VcRKhen

As we discuss/unblock each of the vector issues, I will also follow up on those issues as a comment. So if you prefer to participate only (or primarily) on GitHub, that still also works!

ksen0 avatar Nov 14 '25 09:11 ksen0

Hi @GregStanton,

I'm planning to work on #8214 (fixing heading() dimension quirk) and wanted to confirm my understanding of the .shape/.rank API proposal.

My understanding:

  • .shape returns [3] for a 3D vector (Array type)
  • .rank returns 1 for any vector (Number type)
  • Both should be read-only properties implemented with getter syntax

For #8214 implementation: I would use .shape.length or .rank equivalent to check if a vector is 2D before allowing heading() to execute.

Example check:

if (this.shape.length !== 2) {
  throw new Error('heading() can only be called on 2D vectors');
}

Anirudh-x avatar Nov 15 '25 12:11 Anirudh-x

Hi @GregStanton and team,

I've created PR #8259 implementing the .shape and .rank API as proposed.

This provides the foundation for fixing #8214 and #8215. Would appreciate your feedback!

Thanks!

Anirudh-x avatar Nov 15 '25 13:11 Anirudh-x

Although several people have expressed support for using .shape and .rank, this has not yet been officially decided, so the PR by @Anirudh-x was closed. But just a comment: This issue is for the API that users will use to determine the dimension of a p5.Vector object. Internally, methods that actually implement p5.Vector have access to the _values array which contains the vector values. They don't need to use .shape (or whatever we finally decide on), but should always use this._values.length. This includes #8214 and #8215 which do not need to wait for this issue to be resolved. Indeed, if we do decide to use .shape, it should be implemented with return [this._values.length]; rather than trying to track the dimensions in a separate value that has a risk of not matching the actual dimensions.

sidwellr avatar Nov 15 '25 17:11 sidwellr