sqlite-net
sqlite-net copied to clipboard
Unicode collation
When specifying the SQLite.CollationAttribute on columns, we only have the options BINARY, NOCASE and RTRIM. There is no Unicode collation option. Is it possible to create a custom collation or any other way to make WHERE queries work on unicode values?
If what you need is to make text searches "accent-insensitive" (or any weird character-insensitive... like allow german users to search indifferently for "große" or "grosse"), I have done it in my custom version of sqlite-net (this feature is not available in my github repository.. it is currently in the private repository of my company).
what I did is to use Unidecode.NET (see my version published here https://github.com/csm101/Unidecode.NET/tree/array_instead_of_dictionary.. this one fixes the two open issues in the official repository and has a 3X performance improvement... I have issued a pull request but I haven't received any feedback yet)
I used this library to implement a custom sqlite Unidecode function, so to implement accent insensitive searches i do this:
conn.Query<MyRecord>("select * from mytable where Unidecode(germanfield) like '%'||?||'%'", "große".Unidecode())
(in my customized sqlite-net there is a mechanism for registering sqlite functions implemented in c#)
If there is some interest about this, I could try to extract the relevant code from my private repository and issue a pull request...
@csm101 Edit: I don't think that Unidecode would be a viable solution for my problem because I'm using Chinese text, which doesn't have a simple translation to ASCII. I have a database of chinese sentences, and searching by the mandarin field returns odd results if I allow the database to perform the query. So instead I have to pull the whole table into memory first and then perform the search in C# like the following:
public async Task<List<Sentence>> GetByMandarin(string mandarin)
{
return (await _database.Table<Sentence>().ToListAsync()).Where(s => s.Mandarin == mandarin).ToList();
}
This is obviously hugely inefficient so it would be great if the database could perform this operation.
I have no knowledge about chinese: my target languages were only european languages and cirillyc, but Unidecode has a huge translation table, and I see that it translates some symbols to the equivalent "phonetic" rapresentation... which surely aren't european...
have a look at this translation table here (from my version of unicode): https://github.com/csm101/Unidecode.NET/blob/array_instead_of_dictionary/assets/unidecoder-decodemap.txt
Does this conversion map address mandarin (I have no idea)?
Anyway you could still implement your own "MandarinLike()" sql function in c#... I made it pretty simple to do in my sqlite version:
this is how, for example I added a "GPSDistance(lat1,long2, lat2, long2)" function that calculates the distance in meters between two gps coordinates:
public class SQLiteGPSDistanceFunction : SQLiteFunction
{
public SQLiteGPSDistanceFunction(SQLiteConnection connection)
: base(connection, FunctionName: "GPSDistance", ArgumentCount: 4, Deterministic: true)
{
}
public override void Execute(SqliteFunctionRunContext ctx)
{
if ( ctx.IsNull(0) || ctx.IsNull(1) || ctx.IsNull(2)|| ctx.IsNull(3))
{
ctx.SetNullResult();
return;
}
var latit1 = ctx.DoubleParam(0);
var longit1 = ctx.DoubleParam(1);
var latit2 = ctx.DoubleParam(2);
var longit2 = ctx.DoubleParam(3);
double distance = CoordinatesFunctions.Distance(latit1, longit1, latit2, longit2);
ctx.SetResult(distance);
}
}
and I also added a static global event handler to register these functions whenever a sqlite connection connects to any database:
SQLiteConnection.AfterLogon +=
(connection, args) =>
{
_ = new SQLiteUnidecodeFunction(connection);
_ = new SQLiteGPSDistanceFunction(connection);
};
I addressed only simple functions, but sqlite apis would allow me to implement also custom aggregate functions (like count(), max) or even custom analytic functions (select count(*) over (partition by field1, field2, order by ...) )
and... since I managed to get the attention of an author of the project.. would you mind reviewing my pull request for cancellable queries? https://github.com/praeclarum/sqlite-net/pull/1176
Actually I needed to be able to cancel long running queries for the same implementation I needed this Unidecode function, that is: to implement this kind of search dialog in android:
thank you!
Sorry I'm not an author on the project, when I said I would be happy to review your pull request, I mean I can read it, but I don't think I'm able to approve it into the repo. Once I've made an implementation of SQLiteFunction for MandarinLike(), how would I use that in queries? for example the code that I posted in my previous comment?
My bad, I tought you had write access to the repository and you could approve pull requests, anyway... if you have added your custom function to sqlite, you can use it in plain SQL commands:
return await _database.Query<Sentence>("select * from sentences s where MandarinEquals(s.Mandarin, ?) = true"));