Support `exclude` parameter for SkyPilotBackend `_experimental_pull_from_s3`
It is currently possible to exclude large objects like trajectories when pulling models from s3 through the LocalBackend. We should do the same for the SkyPilotBackend.
I'm confused by why we're using a different query for ascending and descending order - could you help me understand the thinking here?
A high DR implies that the user wants to give higher priority to those cards. However, when sorting purely by ascending R, Anki would be doing the opposite: it would give lower priority to high DR decks.
One could argue that this is what the user chose by selecting the ascending R option. But, the main reason for selecting that option is to allow the most overdue cards to be shown first. The de-prioritization of high DR decks is an unintended side-effect.
Using relative retrievability helps to select the most overdue cards while still prioritizing high DR ones.
For the descending R option, both "forces" are in the same direction. So, using relative R is actually harmful.
Things can probably be made less confusing by renaming the options in the UI. But, I don't have concrete suggestions. Possible wordings can be "Most overdue first" and "Freshly due first".
Ascending retrievability - allows you to select cards with the most Relative Overdueness. Descending retrievability - allows you to select the cards with the least Relative Overdueness first. If the request is accepted, the cards will not mix well between the different presets.
I am not sure what you mean by not mixing well.
If you have two subdecks with different DR, sorting by pure R will select the cards first from the subdeck where DR is higher. Cards from different subdecks will stop mixing. If someone wants to give preference to cards from subdecks more than from other subdecks, they can choose the "Deck, then due date" sort. Although it is less flexible. This applies to the case without a backlog.
In the case of backlog, "pure R" will make the situation worse by choosing cards from the subdeck with a suboptimal, high DR. While "descending R" will select cards with an R that is as close as possible to the set DR. This can be useful with a very large backlog and if DR is optimal. At the moment, FSRS does not allow you to set DR optimal since DR is limited to <70%.
sorting by pure R will select the cards first from the subdeck where DR is higher. Cards from different subdecks will stop mixing.
You are right. I considered this previously but I thought this was not so significant. I will reconsider this.
In the case of backlog, "pure R" will make the situation worse by choosing cards from the subdeck with a suboptimal, high DR.
Yes, I agree that high DR is suboptimal.
Also, the problem mentioned on the Forums is valid too: high-DR cards sit on the steep part of the forgetting curve and their R falls faster. So, if the sorting is by relative R, they will quickly get buried under the low-DR cards because the low-DR cards remain close to their DR for a longer period (resulting in a high relative DR). Now, these high-DR will never be shown until the user overcomes all of their backlog.
Not very relevant here, but, for the same reason, descending R or descending relative R sort order is not good for low stability cards (e.g. newly introduced ones) — they, too, lose their R faster than the other cards.
For these 3 reasons, I think descending R is not good for dealing with backlogs. Ascending R is probably a better choice as it doesn't have any of the above 3 problems. But, it has its own problems like prioritizing cards that are almost completely forgotten (extremely low R). Anyway, this is not the main topic here.
Currently, it seems that my solution produces more problems than it solves. So, I will try to think of an alternative solution.
So, I did some thorough analysis.
Problem solved by the above approach (using pure R desc instead of relative R):
- High DR cards now get priority (as the user desires, even if that's not optimal for quickly overcoming the backlog) and prevents them from being buried forever under the relatively stable queue of low DR cards
Problems introduced by this PR:
- Cards from subdecks with different DR will not mix (unless the DR's are too close or the cards get significantly overdue)
- Cards with no memory states (e.g. due to moving cards between decks) will show up before all the other cards as they have no R (this was not a problem before because relative R could be estimated from interval and elapsed days)
I tried many other formulas for giving priority to high DR cards, but none of them resulted in effective mixing of cards with different DR. For example, I tried R/(1-DR), R + (DR / (1 - DR)) and R * (1 + DR).
But, the current sorting is too problematic. For example, consider this: (sorted in descending order of relative R)
| R | DR | Relative R |
|---|---|---|
| 0.9 | 0.9 | -1.00 |
| 0.8 | 0.8 | -1.00 |
| 0.79 | 0.8 | -1.10 |
| 0.89 | 0.9 | -1.14 |
| 0.78 | 0.8 | -1.20 |
| 0.94 | 0.95 | -1.24 |
| 0.88 | 0.9 | -1.29 |
| 0.77 | 0.8 | -1.31 |
| 0.87 | 0.9 | -1.45 |
| 0.93 | 0.95 | -1.50 |
| 0.92 | 0.95 | -1.77 |
| 0.91 | 0.95 | -2.06 |
| 0.9 | 0.95 | -2.37 |
(Relative R calculated by assuming decay, i.e., w20 = 0.2 — the default value)
Here, the cards with DR = 0.95 have the least priority, which is exactly opposite of what a user would expect.
So, I think that we should fix this problem even if that means we have to introduce two new minor issues (mentioned at the beginning of this comment).
Any suggestions for resolving this issue while still allowing the cards to shuffle would be greatly appreciated. Otherwise, I think that this is ready to merge.
Also, the problem mentioned on the Forums is valid too: high-DR cards sit on the steep part of the forgetting curve and their R falls faster. So, if the sorting is by relative R, they will quickly get buried under the low-DR cards because the low-DR cards remain close to their DR for a longer period (resulting in a high relative DR). Now, these high-DR will never be shown until the user overcomes all of their backlog.
And that's a good thing. In the case of backlog, it is better to focus on decks with a more optimal DR.
Not very relevant here, but, for the same reason, descending R or descending relative R sort order is not good for low stability cards (e.g. newly introduced ones) — they, too, lose their R faster than the other cards.
And it can also be useful for someone. Save what's easier to save. Reduce the number of cards in the backlog, which is more psychologically comfortable. Among the cards with low stability, they can be leechs. It's not very useful when they have a high priority during the backlog.
Problem solved by the above approach (using pure R desc instead of relative R): High DR cards now get priority (as the user desires, even if that's not optimal for quickly overcoming the backlog) and prevents them from being buried forever under the relatively stable queue of low DR cards
But there is no problem here. If the user wants to prioritize cards with high DR in the case of a backlog, they can use Ascending retrievability.
I don't mind both options, but IMO the option label should really reflect what it does. If you tell me it's "by decreasing R," I want decreasing R.
By assuming "what people truly wanted by selecting that", you're making assumptions for them that might not be true and cause much confusion for many other people.
If "relative overdueness" has to be supported, let's call it like that, but don't hide it behind an altered "Decrease/Increasing R"
Call the thing the thing.
Thank you @JSchoreels for the inputs. I have changed my opinion on this. I, now, think that we should have 3 options:
- Descending R (purely based on R)
- Ascending R (purely based on R)
- Relative overdueness (same as current ascending R)
I am not sure what name to give to the current descending R if we want to preserve it too. Suggestions are welcome.
Keep in mind that pure R-based sorts are FSRS-only. They won't be available to SM-2 users.
I will update this PR after https://github.com/ankitects/anki/pull/4424 is merged because that PR will create conflicts here.