tfjs icon indicating copy to clipboard operation
tfjs copied to clipboard

Double CSV Load

Open GantMan opened this issue 4 years ago • 13 comments

The default CSV loader loads each CSV twice.

Example:

     let data = [];
      const csvDataset = tf.data.csv("https://s3.amazonaws.com/ir_public/temp/chess_labels.csv");
      const column_names = await csvDataset.columnNames();
      const sample = csvDataset.take(10);
      await sample.forEachAsync((row) => data.push(Object.values(row)));
      console.log(data);

image

This is particularly problematic with large CSV files.

GantMan avatar Feb 15 '21 06:02 GantMan

Hi @GantMan can the second one be from the browser cache? Also, can you check whether both have response body?

lina128 avatar Feb 23 '21 22:02 lina128

@lina128 - sorry for the delay. Been busy. I do not think it is cache, because the network is hit twice. If I load a 700mb CSV I wait for the file twice. Would you like a demo with a larger file?

GantMan avatar Mar 04 '21 19:03 GantMan

Yeah, that'll be great. I tried above example in a codepen, and couldn't reproduced. Please share a larger file.

lina128 avatar Mar 05 '21 07:03 lina128

@lina128 - I believe I have a line on the issue being caused by a secondary library. I'll report back when I am 100% sure.

GantMan avatar Mar 08 '21 04:03 GantMan

Hey Lina! OK, so the issue is definitely with TFJS. And it's affecting Danfo.js

Here's a reproduction so you can see the one line of code that is causing the issue.

EXAMPLE 1: Danfo.js is double loading but TFJS is fine Example Codepen: https://codepen.io/gantman/pen/abBRObO

As you can see here, TFJS loads the chess_labels dataset once, but Danfo.js is having an issue of loading the apple dataset twice. At first glance it appears TFJS is fine, and Danfo must have the bug. This is not correct, apparently.

EXAMPLE 2: Accessing the sample in TFJS (one line of code), causes a double load. Example Codepen: https://codepen.io/gantman/pen/dyOgmGe

By adding await sample.forEachAsync((row) => data.push(Object.values(row))); the 'chess_labels.csv' is loaded twice.

GantMan avatar Mar 08 '21 20:03 GantMan

@lina128 - were you able to reproduce with the above examples?

GantMan avatar Mar 15 '21 21:03 GantMan

replying to un-stale. @lina128 was the demo good enough?

GantMan avatar Mar 22 '21 22:03 GantMan

@pyu10055 @lina128

Just checking if you can look into this soon.

GantMan avatar Apr 07 '21 00:04 GantMan

Hi @GantMan , sorry I haven't had a chance to look into it yet. @pyu10055 Do you have any idea what could cause this double loading?

lina128 avatar Apr 07 '21 00:04 lina128

@GantMan @lina128 I took a quick look at the implementation of URLDataSource

It currently does not support data caching, which means every data iteration would cause a separate data download. The main concern is the memory usage if we retain the data for iterators.

pyu10055 avatar Apr 14 '21 23:04 pyu10055

Thanks for catching this! Yeah, with large CSV files the performance hit is very noticeable.

GantMan avatar Apr 15 '21 01:04 GantMan

@GantMan can we close this ?

rthadur avatar Jun 29 '21 01:06 rthadur

I think this is a pretty significant bug.

Using TFJS to grab a CSV file means pulling the file TWICE across the network. I don't know where the standard is but I assume it's less than O(2n).

GantMan avatar Jun 29 '21 14:06 GantMan

Hi, @GantMan

Apologize for the delayed response and we're re-visiting our older issues and checking whether those issues got resolved or not as of now so May I know are you still looking for the solution or your issue got resolved ?

If issue still persists after trying with latest version of TFJs please let us know with error log and code snippet to replicate the same issue from our end ?

Could you please confirm if this issue is resolved for you ? Please feel free to close the issue if it is resolved ? Thank you!

gaikwadrahul8 avatar May 15 '23 15:05 gaikwadrahul8

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you.

google-ml-butler[bot] avatar May 22 '23 15:05 google-ml-butler[bot]

Closing as stale. Please @mention us if this needs more attention.

google-ml-butler[bot] avatar May 29 '23 16:05 google-ml-butler[bot]

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar May 29 '23 16:05 google-ml-butler[bot]