libpostal icon indicating copy to clipboard operation
libpostal copied to clipboard

Training on a subset of countries

Open abh1nay opened this issue 3 years ago • 3 comments

Thanks for this terrific project! I am trying to slim down the model due to resource constraints, is it possible to get the steps to train the dataset only on english addresses ( i am hoping this will help) ?

I have tried the steps listed in the README but could not figure out a way to filter the addresses and train the model

abh1nay avatar Apr 08 '21 22:04 abh1nay

Have you figured it out? It would help me out too.

MohMagdi avatar Feb 10 '22 18:02 MohMagdi

+1. This will be very helpful since all not users will need support for all the countries. The trained model is 1.8GB, which is too much.

walkman-kuan avatar May 11 '23 13:05 walkman-kuan

See Splitting data files by country and language for how to train on a subset of countries

walkman-kuan avatar May 27 '23 02:05 walkman-kuan