cpython gh-119033: Deduplicate history entries in new REPL

The main check is in the last line of the change:

... and ret != self.history[-1] ...

But we also need to deduplicate when copying from transient_history, i.e. the temporary changes that may have been made to the history entries while navigating between them. Once any new line is entered, this transient history is committed to the main, "permanent" history.

Transient history is never deduplicated in place, only at the time it is written to permanent history.

Issue: gh-119033

May 20 '24 22:05 optim-ally

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

May 20 '24 22:05 bedevere-app[bot]

In general, I think this is a worthwhile change but I have two reservations.

First, I worry that it changes what the user will see in F2. What I mean is, say you call a function repeatedly. The result will be different. I think the output there should be 1:1 to what the user typed in the current session.

The other thing is performance if we're dealing with a large already saved history. Can you test how this behaves with a history file containing a 1,000,000 entries?

May 22 '24 04:05 ambv

Editing my history file to 1,000,000+ entries didn't have a noticeable affect on speed with deduplication. This is to be expected since comparisons are only done against local history, meaning only new repetitions will be deduplicated. Existing history entries from before this change may still have repetitions.

Good point about F2 behaviour. If we want that to maintain a 1:1 record then "deduplication" could instead be achieved at navigation time, skipping over identical lines when the arrow keys are pressed.

May 22 '24 15:05 optim-ally

Good point about F2 behaviour. If we want that to maintain a 1:1 record then "deduplication" could instead be achieved at navigation time, skipping over identical lines when the arrow keys are pressed.

I was going to suggest this as well. Makes sense to me to do deduplication during navigation.

May 22 '24 15:05 lysnikolaou

What about history search? Should the "next" search result skip over contiguous identical lines as well?

May 22 '24 16:05 optim-ally

Yes because in the context of search "consecutive occurrences of the same line" cannot be easily distinguished from the search not working well ;)

May 22 '24 16:05 ambv

I've hit a snag with the navigation-skipping method of deduplication. Let's say we have the following history:

1
2
2
3
    <-- current position

To a user navigating with arrow keys, the history will appear to have three items: 1, 2, 3 in that order. Now let's navigate up up to the 2 and change it to a 4, then navigate back down before hitting enter. The change will be committed from "transient" history to the primary history:

1
2
4  <-- changed entry
3 
   <-- current position

The apparent history is now 1, 2, 4, 3. (My implementation changes the latest consecutive occurrence, but other approaches have the same problem.) If we repeat this, changing the 2 to a 4 again, the apparent history will finally be 1, 4, 3.

I think this is very unintuitive behaviour to a user but is possibly unavoidable if we want to preserve duplicates in the history file.

UPDATE: chatted with Lys and we decided that this outcome is probably least bad of all the options. I'll stick with it for now unless there are any objections.

May 22 '24 19:05 optim-ally

I think the [F2 output] should be 1:1 to what the user typed in the current session.

A 1:1 version of history is already lost as the history can be edited, even in the old REPL. We could do with two history files - one with 1:1 typed lines and one with edits and deduplication 😕

May 22 '24 19:05 optim-ally