zui icon indicating copy to clipboard operation
zui copied to clipboard

JSON load times: Brim v0.31.0 vs. Zui v1.6.0

Open SoftTools59654 opened this issue 1 year ago • 5 comments

Brim Vs Zui

At the time of importing a file for testing, I imported the same file in Brim and Zui A 2 GB json file with about 250k records The import time in Brim was about one minute, but this time in Zui was more than 2 minutes I imported both files in the same system, but I don't know how standard this is, maybe this problem was only in that file

But I hope you do this test on a standard file to see if there is such a problem or not

SoftTools59654 avatar Feb 08 '24 19:02 SoftTools59654

@SoftTools59654: When you say "Brim" are you talking specifically about Brim v0.31.0 which was the last release of the app when it was still called "Brim"? And for the comparison with Zui are you using the latest GA release Zui v1.6.0? Also, are you on Windows, macOS, or Linux?

philrz avatar Feb 08 '24 20:02 philrz

Yes, that's what I mean. I tested in the same versions that you announced

Of course, maybe my test is wrong, but I tested it several times

The system is Windows

Maybe my review is incorrect, but if you also test, we will get a better result, a large json file with many records. Better shows the difference

SoftTools59654 avatar Feb 09 '24 06:02 SoftTools59654

@SoftTools59654: Since I don't have your specific test data, I attempted to reproduce the symptom using some publicly-available data that has high-level characteristics similar to what you described for your data. However, in my tests the Zui v1.6.0 release performed slightly faster than Brim v0.31.0. Below are the details.

As test data I used some of the hourly GitHub archives. Specifically the ones shown below, which come out to ~1.9 GB and 482k records. So, the size is roughly the same as yours, though this would indicate your records are on average 2x the size of these since you said you had only 250k records adding up to 2 GB.

$ curl -O https://data.gharchive.org/2023-02-08-0.json.gz
$ curl -O https://data.gharchive.org/2023-02-08-1.json.gz
$ curl -O https://data.gharchive.org/2023-02-08-2.json.gz

$ gunzip *.gz

$ du -sh .
1.9G	.

$ zq 'count()' *.json
482272(uint64)

The tests below are on a an AWS EC2 t2.xlarge instance, so 4 vCPU and 16 GB of RAM.

Dragging these into Brim v0.31.0 took ~48 seconds.

https://github.com/brimdata/zui/assets/5934157/04043582-dfb7-4445-a1aa-d6fc1388e24d

Dragging these into Zui v1.6.0 took ~44 seconds.

https://github.com/brimdata/zui/assets/5934157/1ad445e4-9dcf-4b7a-a42b-cc3fe18c6824

Of course, the effect you reported may be unique to your data and/or environment. Some questions to help narrow this down:

  1. Would it be possible to share your test data? If not, could you examine the publicly-available data here and see if you can spot key differences between that and your data? Or, if your records are all fairly similar in structure, perhaps you could "anonymize" a sample record and paste it here so I could create 250k random variations based on it?

  2. Could you describe the CPU/memory resources of your test environment for comparison?

philrz avatar Feb 12 '24 22:02 philrz

Thanks for the complete article you wrote

It's true, I found the problem in more detail. Although Brim didn't import all the data, it was showing the way with the data and didn't import a few million lines.

The interesting thing is when data is imported in Brim, if even half of the data is imported, it displays half of the data, but zui does not.

Maybe I was wrong in the initial review that there was a difference in speed, but there were differences in speed

I will check it a few more times and let you know the result

SoftTools59654 avatar Feb 20 '24 16:02 SoftTools59654

@SoftTools59654: Yes, that makes sense. Not only did Brim v0.31.0 allow for "partial loads" when there were input errors, as captured in https://github.com/brimdata/zui/issues/2660, Zui also behaved that way until fairly recent releases. I just updated and closed that issue to reflect the current state of Zui. As that issue also covers, even when the "partial loads" were possible, the error messaging was poor so it was basically a bug, though as captured in https://github.com/brimdata/zed/issues/4546 perhaps it would be desirable under some circumstances to allow for this behavior when desired (with better error messaging, of course), so we hope to revisit the topic at some point.

I'll hold this issue open in the event you've able to provide any additional detail on your initial repro as you indicated in your last comment. Thanks.

philrz avatar Feb 21 '24 01:02 philrz