String normalisation
Issue by legraphista
Sunday Oct 08, 2017 at 20:15 GMT
Originally opened as https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/628
Any plans to support string normalisation?
NFC — Normalization Form Canonical Composition.
NFD — Normalization Form Canonical Decomposition.
NFKC — Normalization Form Compatibility Composition.
NFKD — Normalization Form Compatibility Decomposition.
Thank you for the great work that has been poured into this project!
Comment by legraphista
Sunday Oct 08, 2017 at 20:23 GMT
Hmm 🤔
I found the code here but when calling apoc.text.clean("test") it returns an unregistered procedure error.
When calling dbms.procedures() the only text related apoc procedures that I get are apoc.text.phonetic and apoc.text.phoneticDelta
I'm using the following: Neo4j: 3.1.7 Apoc: 3.1.3.8-all
Comment by jexp
Monday Oct 09, 2017 at 19:27 GMT
@legraphista is there a Java library that does these?
That one is a user definined function now. You can use RETURN apoc.text.clean("test")
Comment by legraphista
Tuesday Oct 10, 2017 at 09:14 GMT
Yep @jexp, I had to re-read the documentation to notice it changed from procedure to function (my bad 😞 ) .
It does NFD as expected, but after cleaning it up it also strips the text out of any non-alphanumeric characters at this line.
I think there should be an option to toggle the stripping of non-alphanumeric characters.
For example if you want to clean a sentence, you need to split it, clean it word by word then join it back together.
String normalization is supported in regular Cypher for awhile now:
- https://neo4j.com/docs/cypher-manual/current/functions/string/#functions-normalize
- https://medium.com/neo4j/cypher-gems-in-neo4j-5-fa270643f9b0#:~:text=Cypher%20Unicode%20Normalization