Datamancer
Datamancer copied to clipboard
rename docs and order
I have a header line that starts with '#' so I want to do:
var df = readCsv(tsv, sep='\t').rename(f{"mode" <- "#mode"})
this works, but the docs say to use ~
which does not work.
Another issue is that it changes the column ordering by adding the new column at the end (as expected using OrderedTable), but I would expect it to retain the column order but keep a new name.
I know it's early days, but just wanted to flag this as I saw it.
Thanks for the dataframe lib!
Good catch, that it definitely a typo in the docs, should be <-
as you noted :)
Another issue is that it changes the column ordering by adding the new column at the end (as expected using OrderedTable), but I would expect it to retain the column order but keep a new name.
It should be possible by reconstructing the OrderedTable with the replaced key(s), and the columns have reference semantics so there is no expensive copying of the data involved in recreating it.
Thanks for the dataframe lib!
You're welcome!
I have a header line that starts with '#' so I want to do:
First of all, you can just use the header
argument of readCsv
:
var df = readCsv(tsv, sep='\t', header = "#")
(Note: it takes a string, but the implementation currently only works on the first char
of it)
this works, but the docs say to use ~ which does not work.
@HugoGranstrom is almost right. It's not actually a typo, but a leftover from the first fully runtime based data frame implementation. That one had formulas based on ~
without any f{}
macro, so one could write:
let fn = "mode" ~ "#mode"
and pass such a thing to the procs. I threw out the non scoped formulas when I rewrote the data frame implementation, because it seemed to foreign. Better to encapsulate it.
I really need to go through the datamancer code and fix up the docstrings and add runnable examples everywhere. Didn't do any announcements about a first datamancer release because the docs are still in the current state.
For reference on the old implementation. The DF + formula implementation lives here: https://github.com/Vindaar/ggplotnim/blob/fixFormulaImpl/src/ggplotnim/dataframe/fallback/formula.nim and here are some examples of what was possible with that. https://github.com/Vindaar/ggplotnim/blob/fixFormulaImpl/tests/tests.nim#L207-L224
Another issue is that it changes the column ordering by adding the new column at the end (as expected using OrderedTable), but I would expect it to retain the column order but keep a new name.
Yup. I haven't paid much attention to the order of columns so far aside from making sure the order is as initially inserted. It feels like bad style to depend on the order of columns. It should only be important not to confuse people when printing / viewing them or writing them to file.
It seems misguided to provide access to columns based on indices like pandas allows, but maybe I'm missing something.
I do agree though, that the order for renaming and mutating existing columns (here I'm not sure from the top of my head if the order is actually kept, but it should even now) should stay the same.
Again @HugoGranstrom has a good point. In principle we can reconstruct a new table and assign the columns, as they are ref objects.
Maybe the standard library could grow a replace
procedure for OrderedTable
though. Seems like a useful thing to have in general.
This should hopefully be addressed once #15 is merged. Feel free to provide feedback. Otherwise I'll close this issue in a couple of weeks. :)
edit: Now live here: https://scinim.github.io/Datamancer/dataframe.html
Hi, i was triying to open a csv file and get the next error:
Error: unhandled exception: /home/carlos/.nimble/pkgs/datamancer-0.2.5/datamancer/io.nim(518, 14) row + skippedLines == lineCnt - 1
Bad file. Please report an issue. [AssertionDefect]
i used the next code line: var df = readCsv("base_Mdiciembre.csv", sep=';')
Iḿ not sure what to do.
As already mentioned on discord/matrix, could you please provide the first few lines of the CSV file so I can reproduce the problem?
(you could have opened a new issue for this, btw)