crystal-pg
crystal-pg copied to clipboard
Add support for COPY data transfers
After a good night's sleep I realized a few things:
- I initially forgot to consume
@frame_size
on IO.close, so closing after a partial read was bugged. - Mandating an IO.close when reading felt like fragile API design, so it now automatically consumes the final frames when the reader has been drained. IO.close is now optional.
- The protocol allows us to allocate perfectly-sized buffers to read COPY data on a row-by-row basis, but the bare IO interface didn't expose that functionality. I've added a
remaining_row_size
method to support that use case. (I thought of implementing a more performant specialization ofIO.gets
on top, but that's only correct for TEXT formats) - With the COPY-out infrastructure in place, adding COPY-in support is actually pretty trivial as well. I don't really have a use case for that yet, but implemented anyway.
That should be all for now, hopefully I got everything this time. :)
I've tried this in a crystal project for fast import from a csv file to postgresql. It worked as expected with copying, but I think there should be an example on how to use that, I can make a PR after this is merged. My usage looks like this:
DB.connect("postgres://#{postgres_user}:#{postgres_password}@#{postgres_host}:#{postgres_port}/#{postgres_db}") do |db|
io = db.exec_copy "COPY public.legal_unit_region_activity_category_stats_current(tax_reg_ident,name,employees,physical_region_code,primary_activity_category_code) FROM STDIN"
csv = CSV.new(File.open(import_filename), headers: true, separator: ',', quote_char: '"')
while csv.next
sql_text = [csv["tax_reg_ident"],
csv["name"],
csv["employees"],
csv["physical_region_code"],
csv["primary_activity_category_code"],
].map do |v|
case v
when "" then nil
else v
end
end.join("\t")
puts "Uploading #{sql_text}" if verbose
io.puts sql_text
end
puts "Waiting for processing" if verbose
io.close
db.close
end
I think there are possible improvements, such as having a IO#write_row
that does the join and the puts, but I think those can come later.
For efficiency there could also be a binary version, but that would require using a PostgreSQL binary encoding.