practical-sql icon indicating copy to clipboard operation
practical-sql copied to clipboard

Regex match to newline characters on windows10

Open ChandlerLutz opened this issue 6 years ago • 2 comments

Hi Anthony,

Thank you for the wonderful book.

I'm working through Ch. 13 on regular expressions on Windows10 using postgreSQL10 in pgAdmin4

I was having trouble getting the regular expressions to work for example in code listing 13-7.

I believe that the issue is related to the way new line characters are handled on windows.

This also may be related to the following issues (I am using a clean version of the imported csv file from the crime data):

https://github.com/anthonydb/practical-sql/issues/4 and https://github.com/anthonydb/practical-sql/issues/10

I was able to solve this issue with this SO answer: https://stackoverflow.com/a/20056634. Apparently windows may match newlines to \r\n

Here is my sql code for the crime time and the output, where crime_type_orig is the original from the book and the other crime_type2 and crime_type3 are based on the above SO answer:

select 
	regexp_match(original_text, '\n(?:\w+ \w+|\w+)\n(.*):') as crime_type_orig,
	-- See https://stackoverflow.com/a/20056634
	regexp_match(original_text, '\r\n(?:\w+ \w+|\w+)\r\n(.*):') as crime_type2,
	-- Based on https://stackoverflow.com/a/20056634
	regexp_match(original_text, '(?:\r\n|\r|\n)(?:\w+ \w+|\w+)(?:\r\n|\r|\n)(.*):') as crime_type3
from crime_reports;

Here is the output from pgAdmin

image


ChandlerLutz avatar Jan 20 '19 20:01 ChandlerLutz

Hi, @ChandlerLutz! Thanks for sharing this and glad you got it working.

Did you experience a similar issue with any of the other regexes, or just the one for the crime_type?

I tried just now with a new install of PostgreSQL 11 and pgAdmin 4.1 on a on a Windows 7 laptop and a fresh download of the data from this repo, and the book's code worked as expected. So, I'm a bit puzzled as to what's occurring on your side. It sounds to me like you did everything as expected, including a clean download of the file.

Please let me know your thoughts, and we'll leave this issue open in case others run into the same.

anthonydb avatar Jan 22 '19 02:01 anthonydb

Hi Anthony,

So, I essentially ran in to the problem every time that a regex pattern contain the newline character \n. It might be some setting on my Windows10 machine (If I figure it out, I'll let you know). In the end I just replaced every \n with the negated capture group (?:\r\n|\r|\n). This worked well and gave me a little more regex practice!

ChandlerLutz avatar Jan 22 '19 13:01 ChandlerLutz