tfjs
tfjs copied to clipboard
Double CSV Load
The default CSV loader loads each CSV twice.
Example:
let data = [];
const csvDataset = tf.data.csv("https://s3.amazonaws.com/ir_public/temp/chess_labels.csv");
const column_names = await csvDataset.columnNames();
const sample = csvDataset.take(10);
await sample.forEachAsync((row) => data.push(Object.values(row)));
console.log(data);
This is particularly problematic with large CSV files.
Hi @GantMan can the second one be from the browser cache? Also, can you check whether both have response body?
@lina128 - sorry for the delay. Been busy. I do not think it is cache, because the network is hit twice. If I load a 700mb CSV I wait for the file twice. Would you like a demo with a larger file?
Yeah, that'll be great. I tried above example in a codepen, and couldn't reproduced. Please share a larger file.
@lina128 - I believe I have a line on the issue being caused by a secondary library. I'll report back when I am 100% sure.
Hey Lina! OK, so the issue is definitely with TFJS. And it's affecting Danfo.js
Here's a reproduction so you can see the one line of code that is causing the issue.
EXAMPLE 1: Danfo.js is double loading but TFJS is fine Example Codepen: https://codepen.io/gantman/pen/abBRObO
As you can see here, TFJS loads the chess_labels dataset once, but Danfo.js is having an issue of loading the apple dataset twice. At first glance it appears TFJS is fine, and Danfo must have the bug. This is not correct, apparently.
EXAMPLE 2: Accessing the sample in TFJS (one line of code), causes a double load. Example Codepen: https://codepen.io/gantman/pen/dyOgmGe
By adding await sample.forEachAsync((row) => data.push(Object.values(row)));
the 'chess_labels.csv' is loaded twice.
@lina128 - were you able to reproduce with the above examples?
replying to un-stale. @lina128 was the demo good enough?
@pyu10055 @lina128
Just checking if you can look into this soon.
Hi @GantMan , sorry I haven't had a chance to look into it yet. @pyu10055 Do you have any idea what could cause this double loading?
@GantMan @lina128 I took a quick look at the implementation of URLDataSource
It currently does not support data caching, which means every data iteration would cause a separate data download. The main concern is the memory usage if we retain the data for iterators.
Thanks for catching this! Yeah, with large CSV files the performance hit is very noticeable.
@GantMan can we close this ?
I think this is a pretty significant bug.
Using TFJS to grab a CSV file means pulling the file TWICE across the network. I don't know where the standard is but I assume it's less than O(2n).
Hi, @GantMan
Apologize for the delayed response and we're re-visiting our older issues and checking whether those issues got resolved or not as of now so May I know are you still looking for the solution or your issue got resolved ?
If issue still persists after trying with latest version of TFJs please let us know with error log and code snippet to replicate the same issue from our end ?
Could you please confirm if this issue is resolved for you ? Please feel free to close the issue if it is resolved ? Thank you!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you.
Closing as stale. Please @mention us if this needs more attention.