a-studio
a-studio copied to clipboard
Lingtrain Alignment Studio is an ML based app for texts alignment on different languages. It can produce parallel corpora and parallel books.
Lingtrain Studio
💡 Intro
Lingtrain Studio is the ML based app for accurate texts alignment on different languages.
- Extracts parallel corpora from two texts.
- Makes the formatted parallel book from it with sentence highlighting.
⚡ Articles
- 👅 Язык твой — друг твой. Развиваем малые языки
- 🔥 Lingtrain Studio. Книги для всех, даром
- 🧩 How to create bilingual books. Part 2. Lingtrain Alignment Studio
- 📘 How to make a parallel texts for language learning. Part 1. Python and Colab version
- 🔮 Lingtrain Aligner. Приложение для создания параллельных книг, которое вас удивит
- 📌 Сам себе Гутенберг. Делаем параллельные книги
🧬 Models
Automated alignment process relies on the sentence embeddings models. Embeddings are multidimensional vectors of a special kind which are used to calculate a distance between the sentences. You can also plug your own model using the interface described in models directory. Supported languages list depends on the selected backend model.
-
distiluse-base-multilingual-cased-v2
- more reliable and fast
- moderate weights size — 500MB
- supports 50+ languages
- full list of supported languages can be found in this paper
-
LaBSE (Language-agnostic BERT Sentence Embedding)
- can be used for rare languages
- pretty heavy weights — 1.8GB
- supports 100+ languages
- full list of supported languages can be found here
-
SONAR (Sentence-level multimOdal and laNguage-Agnostic Representations)
- Supports about 200 languages (approximately these)
- A large model (3 GB of weights)
- Ideally, requires you to indicate the source language explicitly
- Was originally released at facebookresearch/SONAR based on fairseq2, but here uses a HuggingFace port.
💻 Running on local machine
You can run the application on your computer using docker.
Make sure that docker is installed by typing the docker version
command in your console.
docker-compose
-
docker-compose build
-
docker-compose up
Docker Hub
-
Images configured to run locally are available on Docker Hub.
-
Run the following commands in your console:
-
docker pull lingtrain/studio:v7.2
-
docker run -v C:\app\data:/app/data -v C:\app\img:/app/static/img -p 80:80 lingtrain/studio:v7.2
-
-
App will be available in your browser on the
localhost
address. -
If you need to run the container on another port (e.g. localhost:8081):
- Change the API_URL parameter in config.js
- Rebuild the docker container
- Start it with changed -p parameter (e.g. -p 8081:80)
🔨 Running in development mode
Clone this repo on your machine.
Backend
Flask/uwsgi backend REST API service. It contains all the alignment logic.
-
Go to the backend directory
-
cd /backend
-
-
Install the requirements
-
pip install -r requirements.txt
-
-
Run the backend application
-
python main.py
-
Frontend
SPA. Vue + vuex + vuetify. UI for managing alignment process using BE and a tool for translators to edit processing documents.
-
Go to the frontend directory
-
cd /frontend
-
-
Install the requirements
-
npm install -f
-
-
Compile and run with hot-reloads for development
-
npm run serve
-
Application will be available on localhost:8080
✉️ Feedback
You can create an issue or send me a message in telegram: @averkij
🔑 License
This work is licensed under a Attribution-NonCommercial-NoDerivatives 4.0 International license. See LICENSE.