q icon indicating copy to clipboard operation
q copied to clipboard

Inconsistent results depending on the contents of the datafile.

Open barrycforever opened this issue 8 years ago • 6 comments

Hi,

I think this is a great idea! I tried to use "q", though I got inconsistent results with different text files:

C:\Users\barryc\Desktop>"C:\Program Files (x86)\q-TextAsData\q.exe" -H -d, "select Timestamp,Entity,max(Value) from .\textfile1.csv where Unit='%' and Entity='CHBARRYCAPP1' group by Entity, Timestamp" 1/10/2017,CHBARRYCAPP1,15.52

C:\Users\barryc\Desktop>"C:\Program Files (x86)\q-TextAsData\q.exe" -H -d, "select Timestamp,Entity,max(Value) from .\textfile2.csv where Unit='%' and Entity='CHBARRYCAPP1' group by Entity, Timestamp" 1/10/2017,CHBARRYCAPP1,7.33

C:\Users\barryc\Desktop>

The max value in the file is 15.52 , though it returns 7.33 when a line is added. Please see the second csv file.

Thanks for your help on this!, Barry

PS I've uploaded the files. I had to add a "txt" extension because this page wouldn't take csv files:

textfile1.csv.txt textfile2.csv.txt

barrycforever avatar Mar 03 '17 04:03 barrycforever

It seems q gets confused when integers and floats are mixed in the same column and it falls back to string interpretation in this case (7 is the biggest first character in your example and hence 7.33 the "maximum" string). To force numeric interpretation you can just modify your query a little bit though:

select Timestamp,Entity,max(Value+0) from .\textfile2.csv where Unit='%' and Entity='CHBARRYCAPP1' group by Entity, Timestamp

bitti avatar Mar 03 '17 05:03 bitti

A little disappointing, but the workaround gave the right result in the small csv and the original csv too. Thanks!

barrycforever avatar Mar 03 '17 16:03 barrycforever

hi, sorry for the late reply.

As @bitti wrote (thanks...), this is indeed an issue related to not autodetecting the column type as float. This is related to an old issue I've opened a long time ago but never got to solve ( #57 ).

I will take a deeper look to see how this can be fixed asap.

Thanks, Harel

Btw, obviously not a good enough workaround, but writing 205.0 in the additional row would have worked around the problem as well.

In general, checking which types q detected on a certain file can be done by adding -A to the command line, as follows:

$ q -d , -H -A "select * from textfile1.csv.txt"
Table for file: textfile1.csv.txt
  `Value` - float
  `Timestamp` - text
  `MetricId` - text
  `Unit` - text
  `Description` - text
  `Entity` - text
  `EntityId` - text
  `IntervalSecs` - int
  `Instance` - text
$ q -d , -H -A "select * from textfile2.csv.txt"
Table for file: textfile2.csv.txt
  `Value` - text
  `Timestamp` - text
  `MetricId` - text
  `Unit` - text
  `Description` - text
  `Entity` - text
  `EntityId` - text
  `IntervalSecs` - int
  `Instance` - text
$

harelba avatar Mar 04 '17 18:03 harelba

Thanks for the hint on previewing the interpretation of data. I should have realized that nothing comes for free. :)

Is there any way to provide a hint of the datatype (or force the interpretation of a datatype)? Thanks!

barrycforever avatar Mar 06 '17 19:03 barrycforever

Sorry, it's not possible to provide a hint for the datatype.

harelba avatar Mar 10 '17 07:03 harelba

Okay, thanks.

barrycforever avatar Mar 20 '17 22:03 barrycforever