livebook icon indicating copy to clipboard operation
livebook copied to clipboard

Support file type of `Kino.Input`

Open nallwhy opened this issue 2 years ago • 6 comments

This PR supports file type of Kino.Input.

This PR is related to https://github.com/livebook-dev/kino/pull/153.

image

nallwhy avatar May 29 '22 13:05 nallwhy

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 29 '22 13:05 CLAassistant

Hey @nallwhy, thanks for the PR :) I think we shouldn't keep the file contents in memory, because the files may be arbitrarily large (especially that Livebook operates in data-related domain). We can use path as input value instead. The file would initially be uploaded using LV uploads, this gives us a path within the local file system. However, ideally we don't want to rely on that path within the runtime, because in the future we may support runtimes on a different machine and that wouldn't work. On the other hand, in such case we would need to copy the file to the runtime whenever it's started. We haven't settled as for how we should address file inputs yet. @josevalim feel free to add your thoughts :)

jonatanklosko avatar May 31 '22 22:05 jonatanklosko

@jonatanklosko I think file input is not that different from text input. Test input also can take large input and cause memory problems. It is just another interface of input.

nallwhy avatar Jun 02 '22 07:06 nallwhy

@nallwhy we want to support uploading potentially 100MBs, which is not very common for text area. So we need to indeed discuss how to upload directly to disk. We also want to make sure it returns a path to a file because we may want to show the user how working with some file-related APIs work, and that would be hard to do without a file input that mimics a file.

I will discuss with @jonatanklosko what are the potential approaches here and follow up. :)

josevalim avatar Jun 02 '22 07:06 josevalim

I talked to @chrismccord and we want to allow the LiveView to stream directly to the runtime, but we need to add the proper APIs to LiveView first. :)

josevalim avatar Jun 02 '22 22:06 josevalim

I talked to @jonatanklosko and we came up with an approach that should work:

  1. We will use LiveView uploads for the upload
  2. When consuming, we will transfer the file to the session process
  3. The session process should remove the file once the input is removed
  4. On Input.read, we will call a function in the runtime to transfer the file (if necessary). This should be a no-op for some runtimes as we can assume it runs on the same machine
  5. file_input will then receive the filename from step 4

josevalim avatar Jun 03 '22 12:06 josevalim

This may work for small uploads of a few hundred MB but for larger data I would think LiveBook would want to allow for uploads to go to S3 and/or pull from S3 or other cloud providers. For example, the liveview app I'm working on (https://github.com/mgwidmann/kart_vids) will process potentially GB+ files (these are high resolution 360 videos about 5 minutes long and around 3-4 GB each). I may not end up using Nx and LiveBook to develop the models because I'm still learning and need the greater resources available but I do want to come back and refactor to a pure Elixir approach later.

mgwidmann avatar Nov 17 '22 20:11 mgwidmann

@mgwidmann the idea is to upload files into disk, then Kino.Input.read(input) will return just the path, not file content, so you can read or stream the content yourself. With this approach there should be no issue with regard to file size, since we are not storying it in the memory.

For cloud storage, such as S3, you just can upload the files directly and then use some S3 client to pull the file into Livebook.

jonatanklosko avatar Nov 18 '22 12:11 jonatanklosko

@nallwhy thanks for the effort! I've just implemented the approach described above in #1622 :)

jonatanklosko avatar Jan 04 '23 20:01 jonatanklosko