parso icon indicating copy to clipboard operation
parso copied to clipboard

Write to SQLite database

Open etiennekintzler opened this issue 5 years ago • 5 comments

Is it possible to offer the possibility to write to SQLite database ?

I started to write code for this purpose using sqlite-jdbc. The SQLite's schema is based on the column type (java.lang.Number or java.lang.String). Also I had to tweak DataWriterUtil.processEntry function so it can return, in the case of date, a format that is compatible with SQLite date and time format (ISO 8601 string, see paragraph 2.2 https://www.sqlite.org/datatype3.html ).

etiennekintzler avatar Apr 13 '19 12:04 etiennekintzler

Hey @etiennekintzler parso was designed as a very small library to parse sas7bdat data sets. we want to support the simplest output format which is CSV just for convenience. so from our perspective output to other formats is out of scope - this should be handled separately using custom processor that reads line by line (in java object) and then puts this line to sqlite database.

let me know what you think

printsev avatar May 22 '19 15:05 printsev

Hello @printsev, This is what I end up doing: using the method getRowValues from DataWriterUtil to write into a SQLite database. But since the class DataWriterUtil is final and I needed to tweak processEntry I had to copy and paste the whole class which was cumbersome.

While I perfectly understand your wish of keeping the library small, the output to a sqlite database was a good idea for the following reasons:

  • it requires only minimal code (opening connection to sqlite db, writing schema, writing output from getRowValues into the db)
  • sqlite database offers a good substitute to SAS database as it is a file format and can be queried without loading all the base in RAM.
  • sqlite db has good support with standard data analysis/statistical software such as R and Python (csv text format as well but needs parsing and cannot be queried from directly).

etiennekintzler avatar Jun 01 '19 08:06 etiennekintzler

Hello @printsev Would it be possible to make the class DataWriterUtil not final so I can inherit from it and redefined processEntry so it can output date in ISO 8601 date format (which is needed to have a proper SQLite date type) ? Thank you in advance

etiennekintzler avatar Jul 30 '19 21:07 etiennekintzler

Hi @etiennekintzler If I understand you correctly, are you going to use CSVDataWriterImpl class to write data into SQLite? If that's right then probably we can add the ability to set date formatting when creating CSVDataWriterImpl (via constructor or builder). What do you think?

Yana-Guseva avatar Oct 31 '19 12:10 Yana-Guseva

Hello @Yana-Guseva , I am not going to use CSVDataWriterImpl class to write into SQLite. What I use to write data into SQLite is the method getRowValues from the class DataWriterUtil. This method then calls the method processEntry from the same class. The processEntry will do some undesirable formatting when the entry is of class "Date" :

 if (entry.getClass() == Date.class) {
                valueToPrint = convertDateElementToString((Date) entry,
                        (SimpleDateFormat) columnFormatters.computeIfAbsent(column.getId(),
                                e -> getDateFormatProcessor(column.getFormat(), locale)));

What I would like is to be able to use my own SimpleDateFormat (that will be ISO 8601 compliant) instead of the one defined by :

(SimpleDateFormat) columnFormatters.computeIfAbsent(column.getId(),
                                e -> getDateFormatProcessor(column.getFormat(), locale))

So, what could be done is to create a new method processEntry that include as argument a SimpleDateFormat object that will be passed in convertDateElementToString :

private static String processEntry(Column column, Object entry, Locale locale,
                                       Map<Integer, Format> columnFormatters, 
                                       SimpleDateFormat dateFormat) throws IOException {
        if (!String.valueOf(entry).contains(DOUBLE_INFINITY_STRING)) {
            String valueToPrint;
            if (entry.getClass() == Date.class) {
                valueToPrint = convertDateElementToString((Date) entry, dateFormat);
            } else {
                if (TIME_FORMAT_STRINGS.contains(column.getFormat().getName())) {
                    valueToPrint = convertTimeElementToString((Long) entry);
                } else if (PERCENT_FORMAT.equals(column.getFormat().getName())) {
                    valueToPrint = convertPercentElementToString(entry,
                            (DecimalFormat) columnFormatters.computeIfAbsent(column.getId(),
                                    e -> getPercentFormatProcessor(column.getFormat(), locale)));
                } else {
                    valueToPrint = String.valueOf(entry);
                    if (entry.getClass() == Double.class) {
                        valueToPrint = convertDoubleElementToString((Double) entry);
                    }
                }
            }

            return valueToPrint;
        }
        return "";
    }

Note that the same could be done with the percent formatting. The DecimalFormat could be provided by the user. Since Locale object is use as parameter only convertDateElementToString and convertPercentElementToString we could omit this parameter. The new function could write as follow:

private static String processEntry(Column column, Object entry, 
                                       Map<Integer, Format> columnFormatters, 
                                       SimpleDateFormat dateFormat, 
                                       DecimalFormat decimalFormat) throws IOException {
        if (!String.valueOf(entry).contains(DOUBLE_INFINITY_STRING)) {
            String valueToPrint;
            if (entry.getClass() == Date.class) {
                valueToPrint = convertDateElementToString((Date) entry, dateFormat);
            } else {
                if (TIME_FORMAT_STRINGS.contains(column.getFormat().getName())) {
                    valueToPrint = convertTimeElementToString((Long) entry);
                } else if (PERCENT_FORMAT.equals(column.getFormat().getName())) {
                    valueToPrint = convertPercentElementToString(entry, decimalFormat);
                } else {
                    valueToPrint = String.valueOf(entry);
                    if (entry.getClass() == Double.class) {
                        valueToPrint = convertDoubleElementToString((Double) entry);
                    }
                }
            }

            return valueToPrint;
        }
        return "";
    }

What do you think of this approach ?

etiennekintzler avatar Nov 01 '19 15:11 etiennekintzler