pyemu
                                
                                
                                
                                    pyemu copied to clipboard
                            
                            
                            
                        Delim whitespace
Buggin out when reading whitespace delim files with multiple spaces. Added option for sep='w' to trigger delim_whitespace=True in read_csv. Replace sep='w' in mult2model with single space. Lots of bonus auto formatting courtesy of pycharm.
Coverage remained the same at 78.122% when pulling 41c4329ebb37bc96b94e0a09790fb0b8a94bd68c on delim_whitespace into 29b5a75689a2bd8ff63d39cc3960b6eeff3cb1ec on develop.
I could do with looking into this a bit. I thought we were support multiple delims already (with some "cheap" assumptions relating to file extensions if sep was not passed). Anyway I wonder if passing mfile_sep="\s+" is sufficient?
Seems to be equivalent. I was unaware of '\s+' syntax. How will '\s+' be represented/written in the mult2model table?
On Fri, Aug 12, 2022 at 10:39 AM Brioch Hemmings @.***> wrote:
I could do with looking into this a bit. I thought we were support multiple delims already (with some "cheap" assumptions relating to file extensions if sep was not passed). Anyway I wonder if passing mfile_sep="\s+" is sufficient?
— Reply to this email directly, view it on GitHub https://github.com/pypest/pyemu/pull/358#issuecomment-1212564062, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSJXRHCPFKS75GRA66PPU3VYV6IRANCNFSM56HLDXUQ . You are receiving this because you authored the thread.Message ID: @.***>
-- "Perfect spheres are pointless."
If the file extension is not '.csv' and sep=None (which I think is default) the model file should be read with sep='\s+', as long as fmt='free' (which I think is also the default). Unless I am missing something (highly probable), I think the "whitespace" delimited model file should "work" by default. If you extension is '.csv' you should just be able to override the default sep (',') with sep='\s+'. (I think)
It appears to have been an issue with whitespace before the comment character. Disregard my meddling.
Ahhh! There maybe some inefficiencies/deficiencies with how comments are handled. Raise an issue on that if you spot something of concern!
Reopening. Whitespace before comment was one issue (not pyemu related). But there still seems to be an issue with sep='\s+' when writing to mult2model_info file.
I suspect that there will be issues relating to the '' the string probably needs to be r'\s+'.
More generally though, I think we need to consider what use case we are trying to cover off here:
If the extension is not ".csv" and the file is a list-like file, then the default is '\s+' internally -- no change necessary. (Actually ' ' is what get written to our mult2model info file. We collapse the spaces in the resulting model input files--sorry, not sorry).
If the extension is ".csv" but we actually have a space delimited list-like file you might be able to get away with r'\s+' (in the case when the number of spaces as delimiter is > 1 and variable).
If the file is ".csv" and array-like and is actually space delimited with multiple spaces we may have a bit of a challenge with the current code. This uses the numpy engine which when sep=None treats multiple delims as one (I believe). -- the issue we might have is that if we pass 'sep=None' for these files, internally we store sep=',' for csvs. So this is probs where we need a change. If the user explicitly says that the array type files is space delimited (sep=' ') we might need to allow for multiple delims as one (internally change to sep=None).
Im just trying to catch-up on this convo.  I think generally (given that it is now 2022) we should treat "whitespace" as any combination of one or more spaces and/or tabs.  And I think that is what sep="\+s" and assume that when we recreate the files for the model at runtime, we can use that same definition of "whitespace" - a single space is sufficient (like B said - sorry, not sorry).  So about the file extension tho...if sep is not passed, then I think we have to rely on extension, right?  If sep is passed, then ignore extension?
It was an issue with a space before a comment_char, combined with my inability to navigate the complexity of pst_from. In a whitespace delimited file pandas thinks there is a column before the comment_char (as it would with something like: ",#comment", makes sense). Sorry my coding skills are still stuck in 1997 (a great year... if any of you were alive back then)!