beangulp icon indicating copy to clipboard operation
beangulp copied to clipboard

Same-day transactions in incorrect order, CSV importer

Open tbm opened this issue 4 years ago • 4 comments

I have two transactions in a CSV file with the same date (the CSV file is in descending order) and when I import it they are in the wrong order.

I looked at the CSV importer and extract() correctly sets is_ascending to False, reverses the transactions and returns the entries in the correct order.

So something else must mess up the order. Where should I look?

Test case

bean-extract config.py fidor.csv

Result:

2020-09-30 * "Aktivitätsbonus"
  Assets:Current:Fidor  5.00 EUR

2020-09-30 * "Kontofuehrung"
  Assets:Current:Fidor  -5.00 EUR

2020-10-19 * "Gutschrift; Absender: Martin Michlmayr"
  Assets:Current:Fidor  1.00 EUR

The first two transactions should be swapped.

fidor.csv

Datum;Beschreibung;Beschreibung2;Wert
19.10.2020;Gutschrift;Absender: Martin Michlmayr;1,00
30.09.2020;Aktivitätsbonus;;5,00
30.09.2020;Kontofuehrung;;-5,00

config.py

import os
import sys

sys.path.append(os.path.dirname(__file__))

import fidor

CONFIG = [
    fidor.FidorImporter('Assets:Current:Fidor', '^fidor\.'),
]

fidor.py

"""
Importer for Fidor
"""

import csv
import os
import re

from beancount.core.number import D
import beancount.ingest.importers
from beancount.ingest.importers.csv import Col


class FidorImporter(beancount.ingest.importers.csv.Importer):
    """
    Importer for Fidor
    """

    def __init__(self, account, file_pattern):
        class FidorDialect(csv.Dialect):
            delimiter = ";"
            quoting = csv.QUOTE_NONE
            escapechar = '\\'
            doublequote = False
            skipinitialspace = True
            lineterminator = '\r\n'

        self.file_pattern = file_pattern
        fidor_dialect = FidorDialect()
        super().__init__({
            Col.DATE: 'Datum',
            Col.NARRATION: 'Beschreibung',
            Col.NARRATION2: 'Beschreibung2',
            Col.AMOUNT: 'Wert',
        },
                         account,
                         'EUR', [
                             '^Datum;Beschreibung;Beschreibung2;Wert$',
                         ],
                         csv_dialect=fidor_dialect,
                         dateutil_kwds={'dayfirst': True})

    def identify(self, file):
        if file.mimetype() != "text/csv":
            return False

        if re.search(self.file_pattern, os.path.basename(file.name)):
            return True

        return False

    def parse_amount(self, string):
        """The method used to create Decimal instances. You can override this."""
        return D(string.replace(',', '.'))

tbm avatar Nov 10 '20 07:11 tbm

That's because extract.py sorts your entries by date, SORT_ORDER, and lineno. See here and here. Therefore, for a descending input file entries with the same date and sort order, later entries will appear before earlier ones in the output.

You can use the "new way" of calling beancount.ingest.scripts_utils.ingest() from your import script config.py and sort the entries after extraction with a hook function by any key you like.

iuvbio avatar Mar 10 '21 17:03 iuvbio

This issue should be moved to beangulp.

dnicolodi avatar May 23 '21 16:05 dnicolodi

@tbm One interesting issue is what to do when the CSV file contains transactions covering only one day. In this case it is not possible to infer the ordering from the file content. This is a problem for CSV files that have a balance column: the wrong final balance is picked. I think the only robust solution is adding an ordering parameter to the importer.

dnicolodi avatar Jun 10 '21 10:06 dnicolodi

@dnicolodi I'm fine with this being an option instead of beangulp guessing.

tbm avatar Jun 14 '21 06:06 tbm