parquet-go icon indicating copy to clipboard operation
parquet-go copied to clipboard

Improving error messages

Open leesei opened this issue 3 years ago • 5 comments

I encountered "index out of range [0] with length 0" error in PagesToChunk() in PagesToChunk.go. The reason being I used the wrong struct to in NewParquetWriterFromWriter() (hence the pages being empty).

As in the case of "invalid memory address or nil pointer dereference" error (#421). The reasoning was me using non-native type (time.Time) in object interface.

These are indeed my errors and easily fixable. But I would recommend improving the error messages to improve the user experience.

leesei avatar Dec 02 '21 08:12 leesei

+1 on this, I'm trying to adopt this library into https://www.benthos.dev and currently I'm struggling to get even basic examples to work. The error message I'm getting right now is interface {} is nil, not string. I'm not sure how I could recommend someone use this in my service when I know they're going to get zero meaningful feedback when things aren't working.

Jeffail avatar Jan 06 '22 17:01 Jeffail

@Jeffail We've been using this in production for 2 years now, moving billions of records every day to parquet files on hdfs. I can definitely say that its performing very well and we don't encounter any issues. Internal library messages in any case should be "translated" to user readable benthos-ish messages.

If you wish I can show you how we do it and get you started. And I'll be happy to see if we can benefit from Benthos...

PM me for details

eldadts avatar Jan 06 '22 18:01 eldadts

Thanks @xtrimf, I'm mostly concerned about how I'd be able to specify to a user which field it is that's causing the error, as otherwise they'd have no way of knowing whether the problem is that their schema has a typo in it or the data is incorrect. Ideally I'd want to expose the specific row that causes the error on a write flush but for starters I just want the field name.

Jeffail avatar Jan 06 '22 19:01 Jeffail

Your are sending a struct/json/slice to be written in a parquet rowGroup. I'm not sure it is possible (didn't verify) as the whole object gets encoded...but I could be wrong.

For generic use, when the source data is unknown, we parse the data and decide on the fly its type and build the object schema accordingly - so there is never a mismatch. But we use DBs as source mainly so its easy to get the types beforehand.

eldadts avatar Jan 06 '22 19:01 eldadts

+1, the error reporting isn't good enough. Schema errors give no indication of which field was at fault. I had to put debug messages in to find the problem. Same with writing errors. At the very minimum it needs to point the dev at where the problem might lie.

crodwell avatar Sep 01 '23 11:09 crodwell