crystal-pg icon indicating copy to clipboard operation
crystal-pg copied to clipboard

Add support for COPY data transfers

Open 17dec opened this issue 1 year ago • 2 comments

17dec avatar Dec 20 '23 18:12 17dec

After a good night's sleep I realized a few things:

  • I initially forgot to consume @frame_size on IO.close, so closing after a partial read was bugged.
  • Mandating an IO.close when reading felt like fragile API design, so it now automatically consumes the final frames when the reader has been drained. IO.close is now optional.
  • The protocol allows us to allocate perfectly-sized buffers to read COPY data on a row-by-row basis, but the bare IO interface didn't expose that functionality. I've added a remaining_row_size method to support that use case. (I thought of implementing a more performant specialization of IO.gets on top, but that's only correct for TEXT formats)
  • With the COPY-out infrastructure in place, adding COPY-in support is actually pretty trivial as well. I don't really have a use case for that yet, but implemented anyway.

That should be all for now, hopefully I got everything this time. :)

17dec avatar Dec 21 '23 06:12 17dec

I've tried this in a crystal project for fast import from a csv file to postgresql. It worked as expected with copying, but I think there should be an example on how to use that, I can make a PR after this is merged. My usage looks like this:

DB.connect("postgres://#{postgres_user}:#{postgres_password}@#{postgres_host}:#{postgres_port}/#{postgres_db}") do |db|
  io = db.exec_copy "COPY public.legal_unit_region_activity_category_stats_current(tax_reg_ident,name,employees,physical_region_code,primary_activity_category_code) FROM STDIN"
  csv = CSV.new(File.open(import_filename), headers: true, separator: ',', quote_char: '"')
  while csv.next
    sql_text = [csv["tax_reg_ident"],
                csv["name"],
                csv["employees"],
                csv["physical_region_code"],
                csv["primary_activity_category_code"],
    ].map do |v|
      case v
      when ""  then nil
      else          v
      end
    end.join("\t")
    puts "Uploading #{sql_text}" if verbose
    io.puts sql_text
  end
  puts "Waiting for processing" if verbose
  io.close
  db.close
end

I think there are possible improvements, such as having a IO#write_row that does the join and the puts, but I think those can come later. For efficiency there could also be a binary version, but that would require using a PostgreSQL binary encoding.

jhf avatar Jan 23 '24 20:01 jhf