LaMachine icon indicating copy to clipboard operation
LaMachine copied to clipboard

Slowly deprecate LaMachine in favour of more lightweight and distributed application-containers

Open proycon opened this issue 3 years ago • 4 comments

If you're a user wondering where to migrate from LaMachine, read this comment: https://github.com/proycon/LaMachine/issues/214#issuecomment-1192726286


CLARIAH, the project in which LaMachine is embedded, has (finally) been moving towards clearer software & infrastructure requirements in the past year. These requirements focus on industry-standard solutions like application containers (OCI containers; e.g. usable with Docker) and orchestration of these containers.

This path has some implications for LaMachine, as it, as far containerization is concerned, makes use of more 'fat' containers, alongside other deployment options (VM, local installation etc), whereas aspects such as orchestration are deliberately left out of scope for LaMachine. LaMachine is essentially an infrastructure solution that I provided because there was no common solution to speak of on the project (CLARIAH) level.

As that changes and the new direction becomes clearer, LaMachine's purpose also needs to be revisited. The current solution is likely going to be obsolete. This is not something that will happen overnight and users needn't worry, but I and want to embrace the new direction and slowly facilitate this. Eventually LaMachine will be deprecated and (or possibly continue in a very different form).

A major factor contributing to the decision to deprecate LaMachine is the fact that LaMachine currently tries to tailor for a wide variety of environments (OS variants, Linux distros, Python versions, etc), which comes with a significant maintenance cost (and more prone to breakage). This is no longer sustainable in the long run.

Practically, deprecating LaMachine entails the following:

  • Provide lightweight application containers for all software participating in LaMachine:
    • provide a build file (Dockerfile/Containerfile) for each, in the upstream software repository. No need for Ansible in most cases.
    • publish the resulting images (on Docker Hub and in the (yet to be established) CLARIAH docker registry)
      • automate building and publishing using the continuous deployment infrastructure being set up for CLARIAH
    • prefer a single light-weight distribution like Alpine Linux and actively contribute packages where possible
  • Drop the explicit VM support, container platforms like Docker automatically virtualize on non-Linux systems anyway. The added value is too small to warrant the maintenance cost. h
  • Python bindings that bind with native code (e.g. python-frog, python-ucto, python-timbl, colibri-core, analiticcl) should be provided as wheels and distributed via PyPi
  • Drop the 'native compilation' / virtualenv support. People using virtualenvs can use wheels directly with pip (see above point). The use of virtualenvs remains encouraged but it is no longer needed to 'hold the user's hand' here and set everything up.
  • Preserve various useful aspects of LaMachine. A lot of expertise has been gathered over the years, most of it should be perfectly reusable.
    • development vs stable distinction (stable draws from repositories like Alpine/pypi etc, development builds/installs latest git head source)
    • port and reuse existing configuration templates
  • Configure orchestration solutions (infrastructure-as-code) consisting of multiple components; here is a role for a possible LaMachine v3 if we want to reuse th ename.
  • Limit macOS support; macOS users can use containers (virtualised Linux). Certain software will be continued to be provided natively for macOS via homebrew (as it currently is), without the overarching LaMachine layer.

This issue is meant to track this progress and provide a place for discussion. End-users need not be alarmed yet at this stage.

proycon avatar Feb 16 '22 18:02 proycon

Status update: this has been ongoing for a while. Various things that were in LaMachine are now containerized independently. I will slowly continue this as time progresses.

proycon avatar Jul 18 '22 13:07 proycon

Where to migrate to from LaMachine?

LaMachine is a meta-distribution that provided a one-in-all solution for the installation of a variety of software. Now LaMachine is being deprecated, you will need another solution to install the software you want. What the best solution is depends a lot on the specific software you want to use, the system you are on, and your use-case.

As there will be a little less hand-holding without LaMachine, we expect users who want to install and locally use software to at least be familiar with common technologies such as Python Virtual Environments and Docker containers.

This post intends to guide you to new solutions. It will point to where you can find information on how to install specific software that was previously handled by LaMachine. I will attempt to keep this comment up to date for a while:

  • Frog
    • (command-line interface) -> Frog
      • Alpine Linux: apk add frog
      • Docker: docker pull proycon/frog
      • macOS + homebrew: brew install frog
    • (from python) -> Frog for Python
      • pip install python-frog (use a virtual environment!)
    • (webservice) -> Frog Webservice (CLAM)
      • Docker: docker pull proycon/frog-webservice
  • Ucto
    • (command-line interface) -> ucto
      • Alpine Linux: apk add ucto
      • Docker: docker pull proycon/ucto
      • macOS + homebrew: brew install ucto
    • (from python) -> Ucto for Python
      • pip install python-ucto (use a virtual environment!)
    • (webservice) -> Ucto Webservice (CLAM)
      • Docker: docker pull proycon/ucto
  • Timbl
    • (command-line interface) -> timbl
      • Alpine Linux: apk add timbl
      • Docker: docker pull proycon/timbl
      • macOS + homebrew: brew install timbl
    • (from python) -> Python timbl
      • pip install python-timbl (use a virtual environment!)
  • Colibri Core
    • (command-line interface & python binding) -> colibri-core
      • Python: pip install colibricore
      • Docker: docker pull proycon/colibri-core (no python binding)
  • FoLiA tools/utilities
    • (command-line interface) -> FoLiA utils & FoLiA tools
      • Docker: docker pull proycon/foliautils
      • Python: pip install folia-tools
  • FLAT
    • Docker: docker pull proycon/flat
    • Python: pip install FoLiA-Linguistic-Annotation-Tool (but demands a lot of configuration, docker container recommended for a more out-of-the-box experience like LaMachine provided!)
  • DeepFrog
    • (command-line interface & rust library) -> deepfrog
      • Cargo: cargo install deepfrog (may have some issues currently)
  • Analiticcl
    • (command-line interface & rust library) -> analiticcl
      • Cargo: cargo install analiticcl
    • (python binding) -> analiticcl
      • pip install analiticcl
  • Oersetter
  • Dutch Speech Recognition; Kaldi-NL, asr-nl (formally oral history), and forcedalignment2
    • (command line interface): Kaldi-NL
      • Docker: docker pull proycon/kaldi_nl
    • (webservice): asr-nl
      • Docker: docker pull proycon/asr_nl
    • (webservice): forcedaligment2
      • Docker: docker pull proycon/forcedaligment2

For certain software, there are no convenient alternatives to LaMachine yet, solutions hopefully will emerge as-needed.

  • PICCL & Ticcltools
  • Wopr
  • Valkuil/Gecco
  • T-scan
    • Ownership and maintainership of T-scan is being transferred to Utrecht University (see https://github.com/UUDigitalHumanitieslab/tscan/issues/58)

LaMachine provided some software by CLARIAH/CLARIN partners, we now refer first and foremost to the partners:

  • Alpino -> Use the solutions provided by Groningen, or alternatively use my docker container (docker pull proycon/alpino) but without guarantees that it's up to date.
    • (webservice) -> Alpino Webservice (CLAM, with FoLiA support)
      • Docker: docker pull proycon/alpino_webservice .

LaMachine also bundled a lot of third-party software like Jupyter Lab/Notebook, pytorch, Moses, tensorflow, spaCy, freeling, coreNLP, fasttext, Nextflow. You will need to check your distribution or language's package manager or the upstream provider for solutions.

Integrated environments that offer and interconnect multiple tools for researchers over the web, as were already offered by LaMachine, will be offered instead by the larger CLARIAH infrastructure, of which the Language and Speech Tools portal at CLST is a notable part that will be kept up to date with services for many of the aforementioned software.

The deprecation of LaMachine does not mean it will become suddenly completely unavailable, it can still be used as-is. Things will remain working for a certain time until they break at some point due to divergences in the ecosystem. Such things will no longer be fixed then and users will be directed to the alternative solutions in this post instead.

proycon avatar Jul 22 '22 16:07 proycon

Hey @proycon , I was just notified of this message by a co-worker. So a very belated "thank you very much" for your development and maintenance of LaMachine over the years! 🙏

mhkuu avatar Aug 24 '23 07:08 mhkuu

Just in case anybody comes here looking for replacements: there has been a conda-forge package for ticcltools (and its dependency ticcutils) for some years. I am no longer involved in TICCL development, but the conda package is quite low maintenance, so I do keep the package synced with the ticcltools repo (i.e.: to my knowledge, it is up to date, but I haven't checked recently).

To install ticcltools this way with conda (or mamba): conda install ticcltools -c conda-forge.

If somebody wants to help maintaining the conda-forge packages, wants to take over fully or is simply interested, here they are, feel free to contribute in any way:

https://github.com/conda-forge/ticcltools-feedstock https://github.com/conda-forge/ticcutils-feedstock

egpbos avatar May 17 '24 08:05 egpbos