jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8360070: AccessibleText.getBeforeIndex returns null for last character

Open kumarabhi006 opened this issue 5 months ago • 12 comments

AccessibleText.getBeforeIndex method returns null for last character due to the wrong boundary value condition check. This method returns null when the passed index parameter is equal to text's length which is incorrect. getBeforeIndex method should return null only if the passed index parameter is less than 0 and greater than the text's length.

After modifying the condition check, expected character is returned. Test is added to verify the check,


Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8360070: AccessibleText.getBeforeIndex returns null for last character (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25941/head:pull/25941
$ git checkout pull/25941

Update a local copy of the PR:
$ git checkout pull/25941
$ git pull https://git.openjdk.org/jdk.git pull/25941/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25941

View PR using the GUI difftool:
$ git pr show -t 25941

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25941.diff

Using Webrev

Link to Webrev Comment

kumarabhi006 avatar Jun 23 '25 17:06 kumarabhi006

:wave: Welcome back abhiscxk! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Jun 23 '25 17:06 bridgekeeper[bot]

@kumarabhi006 This change is no longer ready for integration - check the PR body for details.

openjdk[bot] avatar Jun 23 '25 17:06 openjdk[bot]

@kumarabhi006 The following label will be automatically applied to this pull request:

  • client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Jun 23 '25 17:06 openjdk[bot]

getBeforeIndex method should return null only if the passed index parameter is less than 0 and greater than the text's length.

I am not sure about the statement above. I think the check should take care of the direction, which is -1 in your case. This is actually properly handled by the code below(in t he same method you changed):

                    if (index + direction < model.getLength() &&
                        index + direction >= 0) {
                        return model.getText(index + direction, 1);
                    }

The code you added also affects WORD and SENTENCE cases. I suggest covering those with a test as well.

mrserb avatar Jun 23 '25 23:06 mrserb

I am not sure about the statement above. I think the check should take care of the direction, which is -1 in your case

I think the check is to ensure the passed index parameter is to verify the boundary of text length.

This is actually properly handled by the code below(in t he same method you changed):

If the index is equal to the the text length, the returned character from the getBeforeIndex should return the last character i.e. 6 but the existing condition evaluates to true due to index equals to model.getLength() and method returns null. So, even the direction is handled correctly below, execution doesn't reach to that point.

kumarabhi006 avatar Jun 24 '25 11:06 kumarabhi006

@mrserb The code you added also affects WORD and SENTENCE cases. I suggest covering those with a test as well.

Yes, that affects the WORD and SENTENCE cases. I have extended the test to cover them. Before the fix the returned value for WORD and SENTENCE is null that seems incorrect as well.

I think the return string in case of WORD should be Test6 and SENTENCE should be Test4 Test5.

kumarabhi006 avatar Jun 24 '25 11:06 kumarabhi006

I think the check is to ensure the passed index parameter is to verify the boundary of text length.

Yes, but the index passed to the method is an index within the text, so it should be from 0 to length - 1. You can take a look at the opposite case when the index is outside the range, for example at the beginning of the text and getAfterIndex:

result = at.getAfterIndex(AccessibleText.CHARACTER, -1); verifyResult("T", result);

Is that expectation correct or not? I think we should first decide whether this is actually a bug or not.

mrserb avatar Jun 24 '25 23:06 mrserb

I think the check is to ensure the passed index parameter is to verify the boundary of text length.

Yes, but the index passed to the method is an index within the text, so it should be from 0 to length - 1. You can take a look at the opposite case when the index is outside the range, for example at the beginning of the text and getAfterIndex:

result = at.getAfterIndex(AccessibleText.CHARACTER, -1); verifyResult("T", result);

Is that expectation correct or not? I think we should first decide whether this is actually a bug or not.

I got your point now. The spec says that index an index within the text and that means the value should range from 0 to length - 1. And that means, getBeforeIndex won't be able to fetch the last character and getAfterIndex won't be able to fetch the first character.

So, if someone wants to retrieve the first or last character, they shouldn't rely on getAfterIndex and getBeforeIndex method respectively, rather use getAtIndex method to get the first and last character with index passed as 0 and length -1 respectively.

Right ?

kumarabhi006 avatar Jun 25 '25 04:06 kumarabhi006

The spec says

What spec in particular?

aivanov-jdk avatar Jun 25 '25 15:06 aivanov-jdk

The spec says

What spec in particular?

https://docs.oracle.com/en/java/javase/24/docs/api/java.desktop/javax/accessibility/AccessibleText.html#getBeforeIndex(int,int)

It says, index - an index within the text for the getBeforeIndex method description

kumarabhi006 avatar Jun 25 '25 15:06 kumarabhi006

So, if someone wants to retrieve the first or last character, they shouldn't rely on getAfterIndex and getBeforeIndex method respectively, rather use getAtIndex method to get the first and last character with index passed as 0 and length -1 respectively.

Right ?

That is my understanding.

mrserb avatar Jun 25 '25 17:06 mrserb

@aivanov-jdk @azuev-java As per the discussion with @mrserb here, it seems this is not an issue with the code. I would like to know about your opinion ?

If this is not an issue to fix, can we update the javadoc for the APIs for better understanding ?

Something like getBeforeIndex shouldn't be used to retrieve the last character or word or sentence and similar for getAfterIndex and getAtIndex APIs.

kumarabhi006 avatar Jul 07 '25 15:07 kumarabhi006

Something like getBeforeIndex shouldn't be used to retrieve the last character or word or sentence and similar for getAfterIndex and getAtIndex APIs.

It might be useful

mrserb avatar Jul 11 '25 07:07 mrserb

I modified Abhishek's test:

            System.out.println(" i  bef   at  aft");
            for (int i : new int[] {0, 1, 2, 8, 9, 10}) {
                System.out.printf("%2d%5s%5s%5s\n",
                                  i,
                                  at.getBeforeIndex(CHARACTER, i),
                                  at.getAtIndex(CHARACTER, i),
                                  at.getAfterIndex(CHARACTER, i));
            }

This way, I created a table of returned values:

Text: "0123456789"
Text Length: 10
 i  bef   at  aft
 0 null    0    1
 1    0    1    2
 2    1    2    3
 8    7    8    9
 9    8    9 null
10 null null null

I'd expect to see 9 as the return value of getBeforeIndex for the index of 10. This would make the table symmetrical.

Therefore, I'm in favour for Abhishek's proposed fix.


The description of the methods in AccessibleText is vague. None of the three methods specifies what the index means and what values are accepted for it.

According to the text model, index 10 is still valid, it's the index between the last character and the implied line break, so tf.getDocument().getText(10, 1) returns \n.

Having this in mind, the specification of the getBeforeIndex, getAtIndex, getAfterIndex methods should be updated to explicitly specify the valid values for the index parameter.

aivanov-jdk avatar Jul 13 '25 20:07 aivanov-jdk

I got your point now. The spec says that index an index within the text and that means the value should range from 0 to length - 1. And that means, getBeforeIndex won't be able to fetch the last character and getAfterIndex won't be able to fetch the first character.

Here i disagree, the "within the text" does not imply that the position behind the last character is not within the text. Otherwise with the caret is at the end of the text it is impossible to request last word or character before the caret position which is one of the valid use cases of this accessibility method. I think the fix is valid.

azuev-java avatar Jul 14 '25 16:07 azuev-java

I'd expect to see 9 as the return value of getBeforeIndex for the index of 10. This would make the table symmetrical.

Why do you expect that? The spec states that the index should be "an index within the text." That index is also used in other methods, such as getCharacterBounds(), etc. If we fixed it only for text at the end, it would make the behavior even less symmetric, since for a negative value, the model will always throw an exception.

According to the text model, index 10 is still valid, it's the index between the last character and the implied line break, so tf.getDocument().getText(10, 1) returns \n.

The last character is implementation detail, it is not part of the "users data" this is the reason why it is excluded from the Document.getLength(). So passing length as a last character might cause an exception for custom documents. We also have the getCharCount() methods which returns "the number of characters (valid indices)".

Also, why are we only talking about JTextComponent? There are other same implementations of AccessibleText for example JLabel.

Here i disagree, the "within the text" does not imply that the position behind the last character is not within the text. Otherwise with the caret is at the end of the text it is impossible to request last word or character before the caret position which is one of the valid use cases of this accessibility method. I think the fix is valid.

It means exactly that, and it is also impossible to request the first character after the caret position if it is at the start of the document. For both cases getAtIndex() should work.

mrserb avatar Jul 15 '25 03:07 mrserb

I agree with @mrserb: there already is getAtIndex and the documentation is clear about that the index should be within the text. From my side I'd add that changing this behavior may break backward-compatibility with client's custom AWT/Swing components.

OnePatchGuy avatar Jul 15 '25 10:07 OnePatchGuy

It means exactly that, and it is also impossible to request the first character after the caret position if it is at the start of the document. For both cases getAtIndex() should work.

Ok, let's step away from the documentation and try to focus on the use case of this method. Method is supposed to expose navigation within the text component to the assistive technology devices and programs. One of such programs (not the only one) is a text narrator. These narrators are meant to be used on text components and they have special shortcuts to narrate the previous and next word or character compared to the current caret position. And if you say that at the end of the text when caret is behind the last symbol the "narrate previous word before cursor" should say "null" or just say nothing - well, i would like to know what is your logic behind it because for me it is clearly wrong. The documentation needs to be corrected as well as implementation - first of all it does not take into account bidirectional text and i think that might be a problem - but the fix here doing the right thing regarding physical meaning of this interface in the context of accessibility.

azuev-java avatar Jul 21 '25 07:07 azuev-java

Ok, let's step away from the documentation and try to focus on the use case of this method. Method is supposed to expose navigation within the text component to the assistive technology devices and programs. One of such programs (not the only one) is a text narrator. These narrators are meant to be used on text components and they have special shortcuts to narrate the previous and next word or character compared to the current caret position.

Those narrators should not use the AccessibleText interface blindly. Instead, they should check the specification and decide which methods to call and when. The method in question cannot return the first or last word in some scenarios, so relying on it without understanding its limitations could lead to incorrect behavior.

The documentation needs to be corrected as well as implementation - first of all it does not take into account bidirectional text and i think that might be a problem - but the fix here doing the right thing regarding physical meaning of this interface in the context of accessibility.

The proposed solution is not backward compatible, may cause exceptions in 3p code, and does not follow the pattern used in other components that implement AccessibleText.

mrserb avatar Jul 21 '25 07:07 mrserb

I'd expect to see 9 as the return value of getBeforeIndex for the index of 10. This would make the table symmetrical.

Why do you expect that? The spec states that the index should be "an index within the text." That index is also used in other methods, such as getCharacterBounds(), etc. If we fixed it only for text at the end, it would make the behavior even less symmetric, since for a negative value, the model will always throw an exception.

I think the specification for the index parameter — “an index within the text” — is vague, it's open to interpretation.

We should make it specific. For example, is 0 within the text? It's not within, it's right before.

The same goes for the index of 10 in my example above — it's right after the text.

According to the text model, index 10 is still valid, it's the index between the last character and the implied line break, so tf.getDocument().getText(10, 1) returns \n.

The last character is implementation detail, it is not part of the "users data" this is the reason why it is excluded from the Document.getLength().

Exactly! It's not part of the text itself, yet the index at Document.getLength() is a valid position.

I see no reason why accessible API would consider such a position invalid.

Ok, let's step away from the documentation and try to focus on the use case of this method. Method is supposed to expose navigation within the text component to the assistive technology devices and programs. One of such programs (not the only one) is a text narrator. These narrators are meant to be used on text components and they have special shortcuts to narrate the previous and next word or character compared to the current caret position.

Those narrators should not use the AccessibleText interface blindly. Instead, they should check the specification and decide which methods to call and when. The method in question cannot return the first or last word in some scenarios, so relying on it without understanding its limitations could lead to incorrect behavior.

In an ideal world… And again, “an index within the text” is a vague definition which one developer may interpret one way whereas another one may think it's okay. This bug, JDK-8360070, is the proof for that.

As far as I can see, the implementation of getIndexAtPoint can return Document.getLength() as an index, which becomes inconsistent: the index is considered valid in some cases but it's invalid in other cases.

I think the proposed fix is backward compatible: it extends the range of valid indexes for a method call; reducing the range would be more problematic.

At the same time, I agree that all implementations of AccessibleText need updating to be consistent.

The specification for AccessibleText needs updating, too, to avoid any confusion: the valid range for the index parameter has to be explicitly defined such as > 0 && < model.getLength().

aivanov-jdk avatar Jul 23 '25 19:07 aivanov-jdk

At the same time, I agree that all implementations of AccessibleText need updating to be consistent.

I checked with the implementation of getAtIndex, getBeforeIndex and getAfterIndex APIs and these UI components (JSpinner, JPasswordField, JTextComponent, JLabel, AbstractButton, ProgressMonitor classes) implement them. JSpinner, JPasswordField internally depends on the JTextComponent for the actual implementation as the Accessible Text object is an instance of Accessible JTextComponent. Similarly, ProgressMonitor depends on the JLabel class implementation.

JLabel and AbstractButton support these methods only if the text is html and not plain text.

I tried with html based JLabel and analyze the result returned by the getAtIndex API. If a JLabel has the html text <html>This is <b>Bold</b> text</html>, then the result shown by getAtIndex for CHARACTER is:

  Index 0 : 

  Index 1 : T
  Index 2 : h
  Index 3 : i
  Index 4 : s
  Index 5 :  
  Index 6 : i
  Index 7 : s
  Index 8 :  
  Index 9 : B
  Index 10 : o
  Index 11 : l
  Index 12 : d
  Index 13 :  
  Index 14 : t
  Index 15 : e
  Index 16 : x
  Index 17 : t

This seems wrong to me, I expect the character at index 0 should be T.

The array layout from the AbstractDocument class after html tags removal shows image

I don't understand why there is a newline character at the start of the JLabel's text after removing the html tags.

getCharCount method return the number of characters (valid indices) as 18, this includes the newline character at index 0 and trims the last newline character.

I think the new line character shouldn't be considered from accessibility POV, otherwise, the results returned from get*Index APIs looks incorrect to me.

@azuev-java What are your thoughts on it ?

@aivanov-jdk As per my opinion, updating Accessible Text implementation for JLabel and AbstractButton can be taken up as a separate issue.

kumarabhi006 avatar Aug 06 '25 07:08 kumarabhi006

I think the new line character shouldn't be considered from accessibility POV, otherwise, the results returned from get*Index APIs looks incorrect to me.

@azuev-java What are your thoughts on it ?

@aivanov-jdk As per my opinion, updating Accessible Text implementation for JLabel and AbstractButton can be taken up as a separate issue.

I agree this seems weird to me. I have no objections from taking this newly found bug under a new bug id.

The newline character at the start of the message could be required for modelling of HTML in the Document, however, off the top of my head, I can't think of any reason why this would be required.

What I also find weird is that all these component implement AccessibleText from scratch rather than using an implementation that exists in for a text component. Yes, I understand that it's a nested class and therefore it can't be easily extended, yet there could be an internal class which stores the reference for a text component and or text model and implements the methods on top of that. The implementations for other components could use composition in delegate its implementation of AccessibleText to one instance. This would avoid code duplication, and resolving a bug, such as this one, would automatically propagate into other components.

aivanov-jdk avatar Aug 06 '25 14:08 aivanov-jdk

Ok, let's step away from the documentation and try to focus on the use case of this method. Method is supposed to expose navigation within the text component to the assistive technology devices and programs.

I agree with @azuev-java here, we should focus on the purpose of the API and how this API is supposed to be used.

I think it is reasonable to expect that Document.getLength() is a valid index for text APIs, and getBeforeIndex should return the last character of the text in this case.

I said it a few times now, “an index within the text” is a vague condition, we should update the specification so that it is clear which indexes are valid.

aivanov-jdk avatar Aug 06 '25 14:08 aivanov-jdk

I said it a few times now, “an index within the text” is a vague condition, we should update the specification so that it is clear which indexes are valid.

Updated the spec.

kumarabhi006 avatar Aug 07 '25 11:08 kumarabhi006

There are similar APIs getTextSequenceAt, getTextSequenceBefore and getTextSequenceAfter for AccessibleExtendedText interface that invokes getSequenceAtIndex to return the AccessibleTextSequence and I think the fix is required for those method as well. I also suspect getTextBounds method too that it may not return correct bound for index = length of text in case of CHARACTER and WORD. I will test and update in next commit.

kumarabhi006 avatar Aug 07 '25 12:08 kumarabhi006

I think the specification for the index parameter — “an index within the text” — is vague, it's open to interpretation. We should make it specific. For example, is 0 within the text? It's not within, it's right before.

I'm not sure I agree with this argument. The indexes mentioned in all these specs are character indexes. Index 0 refers to the first character in the text, and the last valid index is length - 1. I think some confusion arises from misinterpreting the "index within the text" in the spec and the caret position returned by getCaretPosition. As specified, the caret position is “the zero-based offset of the caret” and it is located between characters. So this value cannot be used directly as a character index in other methods.

I said it a few times now, “an index within the text” is a vague condition, we should update the specification so that it is clear which indexes are valid.

The phrase “an index within the text” appears around 30 times in the spec, and all the relevant methods are implemented consistently to support the index range [0, length - 1]. We could clarify the wording, but why change it to allow character positions outside the text?

I agree with @azuev-java here, we should focus on the purpose of the API and how this API is supposed to be used.

The proposed change would make all existing subclasses, including custom ones, non-compliant with the spec. For example, our own JPasswordField would no longer behave as specified.

What exactly are we trying to fix here? What is the issue that cannot be addressed by using the correct methods for the first or last character positions?

mrserb avatar Aug 13 '25 00:08 mrserb