grass icon indicating copy to clipboard operation
grass copied to clipboard

[Bug] db.univar: fails on Windows due to unix style sort being used

Open neteler opened this issue 1 year ago • 10 comments

v.db.univar fails because db.univar fails. ... on windows, there is a sort executable, that doesn't work like the Unix one. So, on Windows, it doesn't fall back into the Python implementation.

Originally posted by @echoix in https://github.com/OSGeo/grass/issues/3743#issuecomment-2143947719

we need a different implementation for

https://github.com/OSGeo/grass/blob/f59851043fbd7cb52288b7000d7e76824ae63ab7/scripts/db.univar/db.univar.py#L83

neteler avatar Jun 06 '24 12:06 neteler

Maybe just add not sys.platform.startswith('win') and .

wenzeslaus avatar Jun 06 '24 13:06 wenzeslaus

However, this only occurs if the console is cmd (like the console that launches GRASS from OSGeo4W). I wasn't able to find the sort command through powershell. So maybe if the console was a git-bash or a msys2 env, or a special path of the kind, the Linux-style sort that supports -n can be found.

echoix avatar Jun 06 '24 13:06 echoix

If it's a fallback implementation, does it makes sense to just try it in a try-catch? Is it too expensive on windows and we get a performance penalty on repeated calls?

echoix avatar Jun 06 '24 13:06 echoix

If it's a fallback implementation, does it makes sense to just try it in a try-catch?

But if the sort command is just things other things, won't we potentially spend all time in the subprocess just to get garbage which we can't tell from the correct result?

wenzeslaus avatar Jun 06 '24 15:06 wenzeslaus

And why do we need to use Linux only sort command already? What makes it special?

echoix avatar Jun 06 '24 16:06 echoix

If it's a fallback implementation, does it makes sense to just try it in a try-catch?

But if the sort command is just things other things, won't we potentially spend all time in the subprocess just to get garbage which we can't tell from the correct result?

I see you point. The file not found is when the Windows sort.exe tries to find a file named -n and cannot find it. We wouldn't have the error if we placed a file named -n.

echoix avatar Jun 06 '24 16:06 echoix

We wouldn't have the error if we placed a file named -n.

That would be the biggest hack... I would just come up with an easy fix for the bug like checking sys.platform and move on. The code needs a complete re-evaluation/re-implementation.

wenzeslaus avatar Jun 06 '24 21:06 wenzeslaus

What I meant is that we can't identify that sort doesn't support -n only from the file not found error.

The windows sort doesn't seem to have an option that does the same thing as the Linux one.

echoix avatar Jun 06 '24 21:06 echoix

sort in windos

SORT [/R] [/+n] [/M kilobytes] [/L locale] [/REC recordbytes]

  [[drive1:][path1]filename1] [/T [drive2:][path2]]

  [/O [drive3:][path3]filename3]

  /+n                         Specifies the character number, n, to

                              begin each comparison.  /+3 indicates that

                              each comparison should begin at the 3rd

                              character in each line.  Lines with fewer

                              than n characters collate before other lines.

                              By default comparisons start at the first

                              character in each line.

  /L[OCALE] locale            Overrides the system default locale with

                              the specified one.  The ""C"" locale yields

                              the fastest collating sequence and is

                              currently the only alternative.  The sort

                              is always case insensitive.

  /M[EMORY] kilobytes         Specifies amount of main memory to use for

                              the sort, in kilobytes.  The memory size is

                              always constrained to be a minimum of 160

                              kilobytes.  If the memory size is specified

                              the exact amount will be used for the sort,

                              regardless of how much main memory is

                              available.



                              The best performance is usually achieved by

                              not specifying a memory size.  By default the

                              sort will be done with one pass (no temporary

                              file) if it fits in the default maximum

                              memory size, otherwise the sort will be done

                              in two passes (with the partially sorted data

                              being stored in a temporary file) such that

                              the amounts of memory used for both the sort

                              and merge passes are equal.  The default

                              maximum memory size is 90% of available main

                              memory if both the input and output are

                              files, and 45% of main memory otherwise.

  /REC[ORD_MAXIMUM] Zeichen   Gibt die maximale Anzahl an Zeichen pro

                              Datensatz an (Standard: 4096, maximal 65535).

  /R[EVERSE]                  Dreht die Sortierreihenfolge um, d.h. sortiert

                              von Z bis A, dann von 9 bis 0.

  [Laufwerk1:][Pfad1]Datei1   Gibt die zu sortierende Datei an. Wird diese

                              nicht angegeben, wird der Standardeingang zum

                              Sortieren verwendet. Die Angabe der Datei ist

                              schneller als die Umleitung des Standardeingangs

                              auf diese Datei.

  /T[EMPORARY]

    [Laufwerk2:][Pfad2]       Gibt den Pfad an, unter dem ggf. die temporäre

                              Datei angelegt werden soll. Standardmäßig wird

                              das Temporärverzeichnis des Systems verwendet.

  /O[UTPUT]

    [Laufwerk3:][Pfad3]Datei3 Gibt die Datei an, in der die sortierten Daten

                              gespeichert werden sollen. Wird diese nicht

                              angegeben, wird der Standardausgang verwendet.

                              Die Angabe der Datei ist schneller als die

                              Umleitung des Standardausgangs auf diese Datei.

hellik avatar Jun 15 '24 13:06 hellik

https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/sort

hellik avatar Jun 15 '24 13:06 hellik