web-llm data transfer during the execution of Web-LLM

I am not 100% sure, but 97% sure :-) that running the Web-LLM with 3-5 questions caused data transfer in the order of 5-6 GB. Here is the runtime environment:

Device Model: Fujitsu Esprimo Q556
Processor:    Intel(R) Pentium(R) CPU G4400T @  2.90 GHz
RAM           8,00 GB 
Systemtyp     Windows 10 Pro 64-Bit-OS, x64-based Processor
GPU           Intel HD Graphics 510
Browser:      Chrome Version 113.0.5672.53 (Offizieller Build) beta (64-Bit)

If my assumption "Web-LLM caused 5-6 GB data transfer" is plausible, I am interested in references to sources where I can find out why the execution of Web-LLM causes data transfer (download from the Internet?) in this order of magnitude.

Here is an excerpt from the communication with WebLLM, which caused the consumption of 5-6 GB:

Apr 23 '23 06:04 HappyPony

how do you measure data transfer? Besides downloading the initial weights which is cached, WebLLM won't download or upload any data. Maybe you are observing GPU CPU communication as part of the generation process.

Apr 23 '23 17:04 tqchen

Thanks for the feedback. I use my smartphone as a mobile hotspot. I have not checked after each prompt. But after running all my web LLM tests, I saw this 5-6 GB data transfer. I don't have flat rate with my internet provider, that's why I pay attention to data usage.

Besides downloading the initial weights which is cached, WebLLM won't download or upload any data.

Do I understand it correctly that downloading the cached initial weights once when entering the first prompt on the website https://mlc.ai/web-llm/? Can you tell me the size order (MB or GB?), how much data is transferred when cached initial weights?

Maybe you are observing GPU CPU communication as part of the generation process.

Can you please explain in other words what you mean by this?

Apr 24 '23 07:04 HappyPony

Right, the initial weights are downloaded and cached. In your case I think it is due to the initial weight download. They are only cached once however, and unless there is a new version update(which comes with new weights), no further downloads will happen. So in your case it is likely due to download of the initial weights. What you are monitoring is unlikely related to GPU CPU data transfer

Apr 24 '23 12:04 tqchen

In your case I think it is due to the initial weight download.

To give me an idea, can you tell me how many GB are used by the data that initially represents weights? As I said, I don't have a flat rate and I have to pay attention to the amount of data available to me.

Another question in this context: Is it very costly to give the user the option to waive the update before downloading new version update in the menu navigation?

Apr 24 '23 15:04 HappyPony

I think for the case of vicuna it would be 4G. As for model updates, unfortunately the execution code logic can be tied with model versions, so likely that can not be done. There is still a local deployment option that one can use to rebuild the weight locally

Apr 24 '23 18:04 tqchen

@tqchen <joke beginning>I wonder what would be the requirement for not prefixing your statement with an "I think" prefix.<joke end>

Can you agree with this statement, "For the case of vicuna it would be 4GB."? If applicable, is there any reason why a binding statement "For the case of vicuna it would be 4GB." is not possible?
What would be the pre-requisite for the development team to inform potential users in release notes about the volume of data that needs to be downloaded to initialize the model? For example, with an announcement like this "There is an update to the Web LLM initial weighting. To apply this update, you will need to download NN GB."

Apr 25 '23 05:04 HappyPony

web-llm web-llm copied to clipboard

data transfer during the execution of Web-LLM

web-llm
web-llm copied to clipboard