beanquery
beanquery copied to clipboard
offer markdown output from beanquery
it would be super cool to be able to run:
bean-query --format=markdown
and then continue to convert that to whatever format you need (using pandoc or whatnot).
Formatting output is probably not a concern for beancount/beanquery, and it is possible to just take the *.csv, and convert to markdown (or whatever).
However, I can think of two reasons why formatting inside beanquery might be useful:
- beanquery/beancount knows things about a column (data type), which is lost when saved to CSV. For example, beanquery knows that something is a currency, and should thus typically be right aligned.
- by tapping into an external printer of tabular data (such as tabulate beancount/beanquery could offload the entire printing of tabular results (also
text).
Would be great to get some feedback on whether the maintainers in principle consider this in scope for beanquery/beancount. (Because otherwise I'll hack around it to reconstruct pretty Markdown, which I'd prefer to avoid for above stated reasons).
I am not sure I understand everything you're proposing. One one hand, you say that post-processing the CSV output to obtain the formatting you require does not work because the CSV does not expose the data type of columns. On the other hand, you say that beanquery could offload the rendering of the query results to an external library. However, I don't know any table rendering library that understands the Beancount data types (most do not even support Decimal properly).
I have nothing against adding more output formats. However, I'm not going to work on a Markdown formatter, among other reasons, because I don't particularly like Markdown. Also, AFAIK there are many Markdown dialects, and I suspect that most of them do not agree on the syntax for table formatting. Adding a dependency like tabulate to handle the formatting is not really appealing.
That said, results set formatting in beanquery is in principle pluggable, and it should not be difficult to fix it to be properly pluggable. Therefore, it should be possible to add Markdown rendering as a sort of plugin to beanquery.
Alternatively, the current beanquery master exposes a DB-API like interface. Plugging that into a generic table handling library is trivial. For example I use something like
import beanquery
import petl
db = beanquery.connect('beancount:/path/to/example.bean')
table = petl.fromdb(db, 'SELECT date, payee, position')
print(table)
to get transaction data into a Petl table. This, in principle, already give you some sort of export to Markdown via Pandas:
petl.todataframe(table).to_markdown()
However, I suspect it does not do anything sensible with Beancount specific column types. Unfortunately, AFAIK Pandas insists that it can import only from databases supported by SQLAlchemy instead than from any DB-API compatible database driver.
Sorry for being a bit confused in my ticket, and thanks for your thoughtful response.
Some more (hopefully) clarifying comments:
On the other hand, you say that beanquery could offload the rendering of the query results to an external library. However, I don't know any table rendering library that understands the Beancount data types (most do not even support Decimal properly).
I understand that there won't be a external library which knows beancount data types.
What I meant was to let bean-query tell the rendering library, how its data types should be formatted.
For example, all Postings etc. should be right aligned, and . should be the decimal marker (or whatever).
I'd imagine this could be done with tabulate.
Alternatively, would it make sense to interface at a higher level of abstraction and let bean-query export rich rectangular data (using, say, pandas dataframes)? That way, the whole presentation (table rendering) issue would become (rightfully?) pandas' problem. If the column data types shipping with panda are not sufficient, it could also be extended.
I understand that adding external libraries such as pandas or tabulate to bean-query would add bloat.
Would it possible to add, say, pandas or tabulate as an optional dependency (extras_require?), so that you can only export to pandas if you have it installed?
Also, AFAIK there are many Markdown dialects, and I suspect that most of them do not agree on the syntax for table formatting.
Agreed.
I thought that would make something like tabulate attractive, because it writes to various Markdown formatters and a gazillion other formats.
Alternatively, the current beanquery master exposes a DB-API like interface. Plugging that into a generic table handling library is trivial.
This is amazing 😻.
However, as you point out, the richer column types of beancount would probably still be lost.
as I think about this more, the conflicting goals seem to be:
- there should not be more dependencies to bean-query
- the formatting (or even data container, i.e. pandars, polars) shouldn't be a concern for bean-query. It's a user concern.
- users want some API to tap into the richer types of beancount (so I don't have to hunt for which columns are numbers, for example).
Could beancount/bean-query thus:
- Export its rich data types to some flat file format with richer types than CSV (say, JSON with custom schema?)
I understand that there won't be a external library which knows beancount data types.
What I meant was to let bean-query tell the rendering library, how its data types should be formatted. For example, all
Postings etc. should be right aligned, and.should be the decimal marker (or whatever). I'd imagine this could be done with tabulate.
I don't understand how these two statement can go together: if the external library does not know about the beancount data types, how can it format them correctly? The only thing that could be done would be to pass anything that is not a base Python type as a string, but this inherently gives up on any kind of fancy formatting or aligning.
Alternatively, would it make sense to interface at a higher level of abstraction and let bean-query export rich rectangular data (using, say, pandas dataframes)? That way, the whole presentation (table rendering) issue would become (rightfully?) pandas' problem. If the column data types shipping with panda are not sufficient, it could also be extended.
This would just be a more convoluted way to bump into the same limitation. AFAIK, Pandas does not have a way to extend how DataFrames are rendered. Even if there would be hooks to extend the rendering mechanism, it would still be beanquery responsibility to provide the rendering implementation for the custom types. Thus, I don't see how going this route simplifies anything. Pandas uses tabulate for some rendering, thus it would just be a more tortuous route to get to the same dead point.
I understand that adding external libraries such as pandas or tabulate to bean-query would add bloat. Would it possible to add, say, pandas or tabulate as an optional dependency (
extras_require?), so that you can only export to pandas if you have it installed?
I don't think a library function is required. Importing a query result into a Pandas DataFrame is one line of code:
import beanquery
import pandas
conn = beanquery.connect('beancount:test.beans')
curs = c.execute('''SELECT date, position''')
data = pandas.DataFrame(curs.fetchall(), columns=[c.name for c in curs.description])
However, this is not a way to get rendering for free: Pandas does not know how to render that in a meaningful way.
2. the formatting (or even data container, i.e. pandars, polars) shouldn't be a concern for bean-query. It's a user concern.
I'm not quite sure what you mean. Clearly formatting is a beanquery concern. beanquery implements two formatters: a textual based one, with a few options, and the CSV one. I already expressed interest in adding more. However, adding a Markdown one is definitely not on my agenda.
As already stated, beanquery implements the Python DB-API as close as possible. This is what most other Python database adapters implement. I'm sure you can very easily import the data as exposed by this API into whatever container you like.
3. users want some API to tap into the richer types of beancount (so I don't have to hunt for which columns are numbers, for example).
The DB-API like interface returns a description of the query results set, including column names and data types:
import beanquery
import pandas
conn = beanquery.connect('beancount:test.beans')
curs = c.execute('''SELECT date, position''')
print(curs.description)
print(curs.description[0].datatype)
What else do you need?
Could beancount/bean-query thus: Export its rich data types to some flat file format with richer types than CSV (say, JSON with custom schema?)
I'm interested in adding a JSON format. However, this does not seem to be the most straightforward way to solve your problem: reading back the JSON into Beancount objects would be a lot of work.
I implemented proper pluggable query result renderers in #166. For adding Markdown rendering, you just need to add a markdown module into the beanquery.render namespace package and implement a render(descr, rows, file, **kwargs) function. You can look at beanquery.render.csv and beanquery.render.text for examples of how this function should work.
Markdown support is not going to be implemented in this repo. However, the pluggable rendering mechanism makes it trivial to provide such functionality from another package.