Overpass-API icon indicating copy to clipboard operation
Overpass-API copied to clipboard

Invalid regular expression: "^[А-ЯЁ ]+$"

Open Zaczero opened this issue 2 years ago • 7 comments

[out:json][timeout:60][bbox:{{bbox}}];
nwr[name~"^[А-ЯЁ ]+$"];
out body qt;
>;
out skel qt;

Zaczero avatar Aug 25 '23 18:08 Zaczero

While your query works on the overpass-api.de instance, some other instances like kumi.systems fail with the error message above. Some versions of C POSIX regular expressions don't seem to handle ranges with cyrillic characters properly.

As a quick workaround, you might try some other Overpass instance, or maybe avoid the range altogether by explicitly specifying all characters (not properly tested):

[out:json][timeout:60][bbox:{{bbox}}];
nwr[name~"^[АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁ ]+$"];
out body qt;
>;
out skel qt;

Minimum example https://cpp.godbolt.org/z/qz6Tn56j9 fails on some systems.

mmd-osm avatar Aug 26 '23 08:08 mmd-osm

Interesting! This problem impacts my Overpass instance, which I set up using the instructions from https://overpass-api.de/full_installation.html on a Debian docker image. I'm wondering if I've overlooked something.

FROM debian:bookworm-slim

# Install dependencies
RUN apt-get update && apt-get install -y \
    wget \
    g++ \
    make \
    expat \
    libexpat1-dev \
    zlib1g-dev \
    liblz4-dev \
    lighttpd \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Download, extract and compile Overpass
RUN wget https://dev.overpass-api.de/releases/osm-3s_latest.tar.gz -O osm-3s_latest.tar.gz && \
    mkdir ./src && \
    tar -xzf osm-3s_latest.tar.gz -C ./src --strip-components=1 && \
    rm osm-3s_latest.tar.gz && \
    cd src && \
    ./configure --prefix="/app" --enable-lz4 && \
    make dist install clean && \
    cp -r rules .. && \
    cd .. && \
    rm -r ./src
...

Zaczero avatar Aug 26 '23 10:08 Zaczero

By the way, I'm getting the same issue on Ubuntu 22.04, which is also based on Debian bookworm. For some reason, the previous Debian version bullseye seems to work ok.

You could try and replace the first line in your Dockerfile by FROM debian:bullseye-slim to see it helps. We still need to figure out what exactly is causing this issue on the newer Debian version.

mmd-osm avatar Aug 26 '23 17:08 mmd-osm

image

I think I found the cause of that. To check the currently applied locale:

std::cout << "Current Locale: " << setlocale(LC_ALL, NULL) << std::endl;

But maybe there is a better way to set the UTF-8 locale in the first place.

Zaczero avatar Aug 26 '23 18:08 Zaczero

I have read that Python officially supports systems that have at least one of installed:

  • C.UTF-8
  • C.utf8
  • UTF-8

Maybe the same could be done in the overpass-api case.

...btw, I do confirm that switching to FROM debian:bullseye-slim fixed the issue.

Zaczero avatar Aug 26 '23 18:08 Zaczero

It looks like buggy Regex engines from the base system are a real problem. The final solution, even if a workaround, should be to open an avenue to use the Regex engine of choice. I don't know whether the final solution will do some during install time or runtime.

drolbr avatar Nov 23 '23 20:11 drolbr

If the app uses the C locale (since the requested locale is not installed), I don't see it as much of a regex engine issue. The app should simply support a wider range of UTF-8 locales, as other apps do.

Zaczero avatar Nov 23 '23 20:11 Zaczero