table icon indicating copy to clipboard operation
table copied to clipboard

feature request: default implementation for list of structs

Open skyqrose opened this issue 8 months ago • 3 comments

I have a list of structs representing row-major tabular data that I want to satisfy the Table protocol so that I can pass them to Explorer.DataFrame.new().

This is almost automatically works, except that it's specifically disabled for struct rows (but not map rows) in init_row(): https://github.com/elixir-ecto/table/blob/5704327dfaeb8dce0dd731aa2996e731f6e869c6/lib/table/reader/enumerable.ex#L54-L56

Having a tabular data as a list of structs seems like it should be common case. Should this library support it?

A workaround would be to define my own custom implementation of the protocol. I can't define an implementation for List because it's already defined, and I don't want to overwrite the default in all cases, just for lists of structs. But I guess I could define a new struct that wraps the list, and then define the implementation for that.

My ideal implementation would keep the column order from the struct definition, instead of sorting the keys like it does for maps.

skyqrose avatar Jun 26 '25 20:06 skyqrose

I think the issue is precisely what we would convert your structs too. We would likely need to introduce another protocol... as otherwise the chance of having false positives could be too high?

josevalim avatar Jun 26 '25 20:06 josevalim

I don't see the problem. Do you mean like if a struct represents something besides a row of data, and you accidentally read it as a table when you didn't mean to, so you'd need a new TableRow protocol for the struct to opt in to? But that could already happen for maps.

If the issue was guessing which columns and values to extract from the struct, I'd expect using all the fields as columns, same as if you did Enum.map(rows, &Map.from_struct/1)


And to document the workarounds I tried for anyone looking for answers here:

I wrote a wrapping struct that implemented Table.Reader (click to expand)
defmodule SpareExporter.TableOfStructs do
  defstruct [:columns, :list]

  defimpl Table.Reader do
    def init(table) do
      metadata = %{columns: table.columns}

      rows =
        Enum.map(table.list, fn row ->
          Enum.map(table.columns, fn key ->
            Map.fetch!(row, key)
          end)
        end)

      {:rows, metadata, rows}
    end
  end
end
def to_dataframe(list_of_structs) do
  %TableOfStructs{columns: columns, list: list_of_structs}
  |> Explorer.DataFrame.new()
end

But I didn't like it. It turned out a lot cleaner to do the conversion to column-major data myself, which Explorer ingests much easier:

def to_dataframe(list_of_structs) do
  columns
  |> Enum.map(fn key ->
     {key, Enum.map(list_of_structs, fn row -> Map.fetch!(row, key) end)}
  end)
  |> Explorer.DataFrame.new()
end

skyqrose avatar Jun 26 '25 21:06 skyqrose

It could happen for maps, but given they don’t have a structure in them, it is reasonable to guess. If we had it for structs, many of the common structs in Elixir would be guessed wrong, such as regexes and decimal, explorer own dataframes and series, and many more. These structs are meant to be opaque and doing it would go directly them.

josevalim avatar Jun 26 '25 21:06 josevalim