lingtrain-aligner-editor
lingtrain-aligner-editor copied to clipboard
Extracts parallel corpora from the 2 raw texts in different languages.
Lingtrain Aligner. ML powered application for extracting parallel corpora.
Introduction
Lingtrain Aligner is a tool for extracting parallel corpora from texts in different languages.
Models
Automated alignment process relies on the sentence embeddings models. Embeddings are multidimensional vectors of a special kind which are used to calculate a distance between the sentences. You can also plug your own model using the interface described in models directory. Supported languages list depend on the selected backend model.
-
distiluse-base-multilingual-cased-v2
- more reliable and fast
- moderate weights size — 500MB
- supports 50+ languages
- full list of supported languages can be found in this paper
-
LaBSE (Language-agnostic BERT Sentence Embedding)
- can be used for rare languages
- pretty heavy weights — 1.8GB
- supports 100+ languages
- full list of supported languages can be found here
Credits
![]() |
The project was supported by the Center for Academic Development of Students within the framework of the Competition of initiative collective research projects of students of the National Research University "Higher School of Economics". |
Demo
For the quick overview of the alignment process and main functionality you can watch the demo which was helded on the AINL Conference.
How-to
Alignment process is pretty straightforward. After you have the app up and running follow the instructions to start the process. To start the app locally see the Running from Docker Hub section.
1. Upload raw texts
2. Check the splitted documents
3. Align documents
4. Check the result and edit if needed
5. Set the quality threshold
6. Download the corpora
Running on local machine
You can run the application on your computer using docker.
-
Make sure that docker is installed by typing the
docker version
command in your console. -
Images configured to run locally are available on Docker Hub.
-
Run the following commads in your console:
docker pull lingtrain/aligner:st
docker run -p 80:80 lingtrain/aligner:st
-
App will be available in your browser on the
localhost
address.
Deployment
You can deploy and run the app on your server using docker.
Prepare the image
On your local machine.
- Clone the repo.
-
git clone https://github.com/averkij/lingtrain-aligner-editor.git
-
- Edit the following line in ./fe/src/common/config.js file.
-
export const API_URL = "http://[IP_ADRESS]:[PORT]";
For example: -
export const API_URL = "http://89.23.34.12:5000";
-
- Build the app image. Run in the root folder of the repo:
-
docker build . -t aligner:v1
- where aligner:v1 is a tag (some king of the image name).
-
- Now you have your image stored locally. You need to push it to Docker Hub.
- Create an account on Docker Hub. It's a free and publicly available docker registry.
- Login into your account
-
docker login
-
- Tag the image that you've built
-
docker tag aligner my_docker_hub_account/aligner:v1
-
- Push the image to registry
-
docker push my_docker_hub_account/aligner:v1
-
- After a while your image will be uploaded and can be used for deployment.
Deploy it
On your server.
- Make sure that docker is installed by typing the
docker version
command in your console. - Make directories for storing the app results.
-
mkdir /opt/data /opt/img
-
- Pull the prepared image
-
docker pull my_docker_hub_account/aligner:v1
- Wait for downloading. After that you will have the image stored locally.
-
- Start the app
-
docker run -v /opt/data:/app/data -v /opt/img:/app/static/img -p [PORT]:80 my_docker_hub_account/aligner:v1
- where /opt/data, /opt/img are folder on your server
- and /app/data, /app/static/img are folder inside the container. Don't change them.
- [PORT] is the port that you have configured while building the image.
-
Running in development mode
Backend
- /be
Flask/uwsgi backend REST API service. It's pretty simple and contains all the alignment logic.
python main.py
Frontend
- /fe
SPA. Vue + vuex + vuetify. UI for managing alignment process using BE and a tool for translators to edit processing documents.
Setup
npm install
Compile and run with hot-reloads for development
npm run serve
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.