csvkit Allow disabling/forcing type inference for certain columns only

A mapping?

--types int,varchar

A .csvt?

Mar 01 '12 01:03 onyxfish

A CSVT is might be a little exotic, but might be most robust solution if you do the same tasks over and over. The example I had so far was that csvsql made something a date field that was a varchar, and I couldn't really get it to do what I wanted.

Could you just specify which fields to not guess, defaulting to varchar?

CSVT would also make you specify all columns, right? That would be daunting on a big dataset, probably defeat the purpose.

I think I like the "Don't guess on this column" option most.

Mar 01 '12 01:03 mikejcorey

The latter is certainly a possibility although I'm inclined to implement a more general solution if one exists. I like .csvt because 1) it's still CSV and 2) it's an existing (albeit, as you say, exotic) convention. The somewhat annoying thing about it is that I'll be mandating a pretty specific list of supported Python types, which aren't going to match any other type system out there in the world.

Internally csvkit normalizes to:

NoneType, bool, int, float, datetime.datetime, datetime.time, datetime.date and unicode

Mar 01 '12 01:03 onyxfish

Am I right that you'd have to specify all columns if you went the .csvt route?

Mar 01 '12 01:03 mikejcorey

That's true, that is def. a downside. Maybe a

--no-infer a,b,c

syntax is better after all.

It's also worth keeping in mind that for type coercion things can really only be cast "down", i.e. int -> unicode. If you were to try to use a csvt to specify a more granular type the thing would just blow up anyway.

Mar 01 '12 01:03 onyxfish

Yeah, I think that's OK -- it's more important to me that something fails over to generic rather than specific. So if I have to CAST (blah) AS INTEGER, that's no big deal.

Supporting .CSVT might be a nice feature as well, but would not really solve my particular problem, which I think I'd come come across more often if my main use is to quickly start playing with some data.

In any case, csvsql is really cool. Navicat is obviously good at CSV imports, but still requires some configuration guesswork. It's a huge timesaver and probably nearly eliminates the need for certain types of users in our organization to even use Navicat, which would definitely save us some money.

Mar 01 '12 01:03 mikejcorey

That's wonderful to hear. I'll look at hacking in a way of force values to strings sometime soon (possible tonight, though I'm down other rabbit holes at the moment). Thanks for the feedback!

Mar 01 '12 01:03 onyxfish

Great, thanks! No rush from me, just wanted to say something while I was thinking of it.

Mar 01 '12 02:03 mikejcorey

Noting that there's some discussion of possible solutions in the referenced issues above.

Jan 25 '16 17:01 jpmckinney

So I think the simplest satisfactory solution for the reported feature request is to allow --no-inference to accept column names, e.g.:

--no-inference a,b,c

Jan 28 '17 18:01 jpmckinney

is there a realistic plan to do this?

Jun 27 '22 16:06 mingfang

There is no time planned to work on this issue. It remains open.

Jun 28 '22 14:06 jpmckinney

csvkit csvkit copied to clipboard

Allow disabling/forcing type inference for certain columns only

csvkit
csvkit copied to clipboard