lute-v3 icon indicating copy to clipboard operation
lute-v3 copied to clipboard

Add "export unknown terms" (or "export all terms and statuses") action to Book actions

Open jzohrab opened this issue 1 year ago • 3 comments

~~Blocked by #316~~ - this is done now

The parent mapping export used to have a thing to export all unknown terms. That could be useful for loading up vocab lists for books.

The code has some TODO issue_336_export_unknown_book_terms markers for things that should be used for this.

  • add action
  • add unit test (or restore from existing) -- note that books now add status 0 terms while reading, have to handle those.
  • add integration test (or restore from existing)

UPDATE: Lute has a CLI job to export book terms -- see the comment below for notes about what's needed to make this a book action callable from the UI.

As part of this work, any code with TODO issue_336_export_unknown_book_terms should be removed, as I don't think it's used anymore.

jzohrab avatar Mar 14 '24 11:03 jzohrab

No longer blocked.

jzohrab avatar Mar 22 '24 22:03 jzohrab

This is slightly more complicated than the hacky code marked with the TODO, or the language_term_export.py thing.

The current hacky code doesn't include multiword terms. For languages like classical chinese, that's important.

I think that what needs to happen is an in-memory "render" of each page, something like read.service.start_reading -- but without saving all of the status 0 terms. The resulting paragraphs will contain all of the text tokens, including net new ones (not saved) and saved status 0 ones, and all the rest, of course.

The test cases for this are pretty easy, even if the code isn't:

  • new book = all words
  • new book with some known words
  • new book with some multi-word terms
  • new book with some status 0 terms
  • extraneous status 0 words not included
  • at the start and end of each test run, the number of terms saved in the db should not increase, book current text id shouldn't change

Since this is long-running, may need to have some kind of WebSockets to report back to the client.

jzohrab avatar Jun 04 '24 05:06 jzohrab

Some good interim progress. Hacked at the language term export job quite a lot, and added a new book_term_export <bookid> <filename> cli job, e.g.:

flask --app lute.app_factory cli book_term_export 432 sp_terms.csv

This is a bit slower than the old job, b/c it essentially does the calculations for a full page render for each page. It feels like it should be faster, but whatever.

This can't be added to the "actions" dropdown, b/c it doesn't communicate well back to the client. The job just prints to the command line, but when clicked from the web ui the job should really communicate back via a web socket, and then download the file at the end. Since the job is slow-ish, the user should be notified what's happening.

jzohrab avatar Jun 04 '24 05:06 jzohrab