aeneas icon indicating copy to clipboard operation
aeneas copied to clipboard

Keep only numpy as dependency

Open readbeyond opened this issue 8 years ago • 4 comments

Only numpy is actually needed by everybody.

Suggest to packagers to still include lxml and BeautifulSoup4, and hint at the possibility of including the full set of deps. Maybe create a requirements_full.txt file.

readbeyond avatar Dec 13 '16 09:12 readbeyond

So what are lxml and BeautifulSoup4 needed/used for?

danielbair avatar Dec 13 '16 12:12 danielbair

On 12/13/2016 01:59 PM, Daniel Bair wrote:

So what are lxml and BeautifulSoup4 needed/used for?

lxml and bs4 are used for I/O sync maps in XML-like format (xml, smil, ttml, etc.).

lxml is used to format and output XML-like files; bs4 is used for lenient parsing of input text files in "unparsed" format, since it allows parsing both HTML and X(HT)ML.

In theory, if the aeneas user only feeds (say) "plain" text and wants (say) "json" or "aud" or other "csv"-like output, then lxml and bs4 are never imported.

There is a trade-off here between:

a. keeping the dependencies graph small and the packaging/installation process fast; and

b. providing users with everything working out-of-the-box.

I admit the current approach is mixed and not very elegant. For example:

  • XML formats => lxml, bs4 required deps
  • TextGrid => tgt needed, but not a required dep

Furthermore, the same observation applies to other functionalities in aeneas:

  • download audio from YouTube => youtube-dl, pafy
  • use plot_waveform => Pillow
  • use a cloud TTS => requests (or boto3 in case of AWS Polly)

So, here I see two main paths:

a. make all these packages required dependencies (PRO: everything would work out-of-box; CON: size and installation complexity, packagers have more work); or

b. remove lxml and bs4 from the required dependencies (PRO: smaller installation, less work for packagers; CON: the user might need to install certain libraries later).

I am still not sure which path is the best one.

In any case, the code base would benefit from consolidating the code that currently checks at import time if certain (optional) packages are available and, if not, errors warning the user to install the relevant Python package.

readbeyond avatar Dec 13 '16 14:12 readbeyond

I would favor minimal dependencies for basic functionality, and then if the user attempts to use a function that requires a missing python package/module, prompt them to install it before they can proceed. I could also see an installer (like the SIL one) that asks the user what functionality they require/want and then installs the necessary packages/modules.

danielbair avatar Dec 13 '16 15:12 danielbair

On 12/13/2016 04:16 PM, Daniel Bair wrote:

I would favor minimal dependencies for basic functionality, and then if the user attempts to use a function that requires a missing python package/module, prompt them to install it before they can proceed. I could also see an installer (like the SIL one) that asks the user what functionality they require/want and then installs the necessary packages/modules.

Currently I am inclined to adopt this minimal approach, but it is something I would not change without due consideration.

Also, it should be noted that lxml also requires the XML lib and a compilation step (if a pre-compile wheel is not available). All the other optional deps are Python-only.

Giving users the possibility of installing a subset of deps of their choice would be great --- if it is not a big trouble for packagers to implement it.

People installing through pip can already get all the dependencies (including optional ones) with:

$ pip install aeneas[full]

instead of

$ pip install aeneas

(the latter grabs just lxml and bs4, if not already installed)

The worst problem so far is with Debian/Ubuntu packaging, since the "Debian-blessed way" consists in having each Python package encapsulated into its own .deb package.

readbeyond avatar Dec 13 '16 21:12 readbeyond