actual
actual copied to clipboard
[WIP] fix(#2562): Prevent transaction deduplication for imported transactions
Hello,
After looking into the issue, it seems that the transaction deduplication is due to the two transactions having the same amount and account and the second transaction happened within 7 days of the first one.
After reading the import transaction code, the fuzzy matching matching uses the 7 days method to try to match similar transactions and merge them.
Proposed solution: prevent the fuzzy matching from matching imported transactions (where import_id is not null) to prevent this behavior.
@MatissJanis what do you think ?
Deploy Preview for actualbudget ready!
Name | Link |
---|---|
Latest commit | b54fab0f365b31865a71917aca38ccb4685e09b3 |
Latest deploy log | https://app.netlify.com/sites/actualbudget/deploys/663b56fa5374fd000809d798 |
Deploy Preview | https://deploy-preview-2618.demo.actualbudget.org |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Bundle Stats — desktop-client
Hey there, this message comes from a GitHub action that helps you and reviewers to understand how these changes affect the size of this project's bundle.
As this PR is updated, I'll keep you updated on how the bundle size is impacted.
Total
Files count | Total bundle size | % Changed |
---|---|---|
9 | 4.71 MB | 0% |
Changeset
No files were changed
View detailed bundle breakdown
Added
No assets were added
Removed
No assets were removed
Bigger
No assets were bigger
Smaller
No assets were smaller
Unchanged
Asset | File Size | % Changed |
---|---|---|
static/js/indexeddb-main-thread-worker-e59fee74.js | 13.5 kB | 0% |
static/js/resize-observer.js | 18.37 kB | 0% |
static/js/usePreviewTransactions.js | 790 B | 0% |
static/js/BackgroundImage.js | 122.29 kB | 0% |
static/js/AppliedFilters.js | 21.16 kB | 0% |
static/js/narrow.js | 59.81 kB | 0% |
static/js/wide.js | 262.38 kB | 0% |
static/js/ReportRouter.js | 1.23 MB | 0% |
static/js/index.js | 3 MB | 0% |
Bundle Stats — loot-core
Hey there, this message comes from a GitHub action that helps you and reviewers to understand how these changes affect the size of this project's bundle.
As this PR is updated, I'll keep you updated on how the bundle size is impacted.
Total
Files count | Total bundle size | % Changed |
---|---|---|
1 | 1.2 MB → 1.2 MB (+24 B) | +0.00% |
Changeset
File | Δ | Size |
---|---|---|
packages/loot-core/src/server/accounts/sync.ts |
📈 +24 B (+0.11%) | 22.13 kB → 22.16 kB |
View detailed bundle breakdown
Added
No assets were added
Removed
No assets were removed
Bigger
Asset | File Size | % Changed |
---|---|---|
kcab.worker.js | 1.2 MB → 1.2 MB (+24 B) | +0.00% |
Smaller
No assets were smaller
Unchanged
No assets were unchanged
Hi, I took a look at this PR (context: I'm the author of #2562). I understand MatissJanis' point, and I believe the issue could be approached in a different way to avoid breaking existing use cases.
Do not completely exclude transactions that have a imported_id != NULL
from the deduplication logic. Instead, skip the deduplication of two transactions only if both transactions have an imported_id
set and the IDs are different.
(On a side note, I believe reconciled transactions should be excluded from deduplication altogether, instead.)
I think this would both solve the issue I reported and avoid breaking the other use case. Hopefully it makes sense.
Do not completely exclude transactions that have a
imported_id != NULL
from the deduplication logic. Instead, skip the deduplication of two transactions only if both transactions have animported_id
set and the IDs are different.
I think this would work. Can't come up with any counter arguments for this at the moment.
Instead, skip the deduplication of two transactions only if both transactions have an
imported_id
set and the IDs are different.
This sounds good to me. When I import from an OFX file, what I'm expecting is an option to disable deduplication heuristics, not to disable deduplication entirely.
@ttlgeek Just wondering if you were going to continue this work as suggested by the comments above. If not I can give it a go. It would be my first contribution so would probably take longer, but got to start somewhere. Let me know what your plans are.
@pmoon00
@ttlgeek Just wondering if you were going to continue this work as suggested by the comments above. If not I can give it a go.
Looks like for whatever reason he's not coming back to it anytime soon. Possibly he's deployed his forked branch to his own instance and is happy with its' behaviour.
It would be my first contribution so would probably take longer, but got to start somewhere.
This PR / branch is probably still good basis for your work btw - Take the work here and then apply @MatissJanis + @jacopo-j 's comments (Should mainly be a change to the SQL query that Matiss commented on) and you shouldn't have to do too much to get approved.
In fact, I may also have a go at this over my lunch break now
@jacopo-j re-reading your comment
Do not completely exclude transactions that have a
imported_id != NULL
from the deduplication logic.
Instead, skip the deduplication of two transactions only if both transactions have an
imported_id
set and the IDs are different.
I think @ttlgeek 's PR already does what you're suggesting, it's just not obvious from the default diff. Expanding the context of the diff yields the following:
https://github.com/actualbudget/actual/blob/e5933ad8cde43802160f9f35237a4898b172610c/packages/loot-core/src/server/accounts/sync.ts#L380-L402
To rephrase your specification:
Do not completely exclude transactions that have a
imported_id != NULL
from the deduplication logic. Instead, skip the deduplication of two transactions only if both transactions have animported_id
set and the IDs are different.
if tx_in has imported_id:
if tx_in.imported_id corresponds to existing_tx:
dedup using IDs
exit
# In the above, both transactions have imported ID, but the IDs match, so no skip
# The following logic must hold for the the rest of this flow:
# tx_in has no imported_id
# OR
# tx_in's imported_id does not match (ids are different)
if an existing_tx has no imported_id:
dedup using heuristics
exit
# (
# tx_in has no imported_id
# OR
# tx_in.imported_id does not match (ids are different)
# )
# AND existing_tx has no imported_id
This behaviour is consistent with @ttlgeek 's PR.
I was trying to poke holes in the logic, but couldn't find any issues. Thanks for correcting me!
This morning I thought about it and had a feeling I may be overlooking something. It's definitely tricky inverting the logic between what the SQL statements are doing and the idea of "when to skip".
I think possibly there's an issue with this being the skip case:
# (
# tx_in has no imported_id
# OR
# tx_in.imported_id does not match (ids are different)
# )
# AND existing_tx has no imported_id
Because that includes tx_in.imported_id == NULL && existing_tx.imported_id == NULL
which is NOT a skip case.
Your instinct to double check is valid.
I think the change is small, however, we can just add another condition to ttlgeek's original SQL modification.
In terms of style, I think we should have linebreaks that delineate the logic more clearly in the SQL statements, and comments,
`SELECT id, is_parent, date, imported_id, payee, category, notes, reconciled FROM v_transactions
WHERE date >= ? AND date <= ? AND amount = ? AND account = ?
-- Only perform it if either ID is null
AND ( imported_id IS NULL OR ? IS NULL)`, // tx_in.imported_id should be last parameter
I was trying to poke holes in the logic, but couldn't find any issues. Thanks for correcting me!
This morning I thought about it and had a feeling I may be overlooking something.
It's definitely tricky inverting the logic between what the SQL statements are doing and the idea of "when to skip".
I think possibly there's an issue with this being the skip case:
# ( # tx_in has no imported_id # OR # tx_in.imported_id does not match (ids are different) # ) # AND existing_tx has no imported_id
Because that includes
tx_in.imported_id == NULL && existing_tx.imported_id == NULL
which is NOT a skip case.Your instinct to double check is valid.
I think the change is small, however, we can just add another condition to ttlgeek's original SQL modification.
In terms of style, I think we should have linebreaks that delineate the logic more clearly in the SQL statements, and comments,
`SELECT id, is_parent, date, imported_id, payee, category, notes, reconciled FROM v_transactions WHERE date >= ? AND date <= ? AND amount = ? AND account = ? -- Only perform it if either ID is null AND ( imported_id IS NULL OR ? IS NULL)`, // tx_in.imported_id should be last parameter
Would it be clearer if the AND imported_id IS NULL
predicate is only included if the tx_in.imported_id
is set/exists?
I'm not sure what the conventions around dynamic SQL predicates are in this project though.
Would it be clearer if the
AND imported_id IS NULL
predicate is only included if thetx_in.imported_id
is set/exists?I'm not sure what the conventions around dynamic SQL predicates are in this project though.
Different branching logic overall might be preferable and more readable, however my assumption is that dynamic SQL should be avoided. It's more likely to lead to bugs and could potentially become a vector for injection vulns (not very likely in this case).
It's true that this statement slightly violates DRY with the code branch that checks that tx_in.imported_id
is null, but I don't think it's a big problem.
AND ( imported_id IS NULL OR ? IS NULL)
Hi @MatissJanis @pmoon00 I've taken a stab at this in my own PR #2770 which bases off of @ttlgeek 's branch (though without all the master -> feature merge commits because I think those are ugly).
The only difference is the logic of the SQL statement, which now allows the dedup to occur if an existing transaction OR the input transaction are null. There's also a formatting difference in the SQL statement (I've re-ordered it and added linebreaks + comments to logically separate the fuzzy logic from the null conditioning )
Closing in favour of https://github.com/actualbudget/actual/pull/2770