AzureStorageExplorer icon indicating copy to clipboard operation
AzureStorageExplorer copied to clipboard

Find accurate progress reports for table import

Open craxal opened this issue 3 years ago • 2 comments

Initially, progress was reported based on the number of bytes read from the import file, but this was inaccurate, because the number of bytes read doesn't correlate well with the number of entities read or uploaded.

A solution will likely require a refactor of the CSV parser to read less greedily and/or report the number of lines or records that have been read.

craxal avatar Jan 19 '22 22:01 craxal

If 1.21 and older did not have progress % for import, then feel free to move this to later milestone.

MRayermannMSFT avatar Jan 20 '22 21:01 MRayermannMSFT

Progress percentage was not reported initially, so moving to 1.24.0.

craxal avatar Jan 21 '22 02:01 craxal

This was actually resolved when working on performance improvements for table import. Progress is now calculated as follows:

$$\frac{\text{bytes read}}{\text{file size}} \times \frac{\text{entities uploaded}}{\text{total entities}}$$

Ideally, progress would be determined by the number of entities. However, the total number of entities cannot be known when import starts, and increases over time. The file size provides a better basis, but the import is not complete until we've uploaded what's read from the file.

The answer is to combine the two. The file size is the dominant factor, adjusted slightly by the number of entities. When the entire file has been read, the dominant factor becomes the number of entities, which is good, because by then we do know how many entities need to be uploaded.

craxal avatar Aug 04 '23 23:08 craxal