gitea
gitea copied to clipboard
Add `MAX_ROWS` option for CSV rendering
This solution implements a new config variable MAX_ROWS, which corresponds to the “Maximum allowed rows to render CSV files. (0 for no limit)” and rewrites the Render function for CSV files in markup module. Now the render function only reads the file once, having MAX_FILE_SIZE+1 as a reader limit and MAX_ROWS as a row limit. When the file is larger than MAX_FILE_SIZE or has more rows than MAX_ROWS, it only renders until the limit, and displays a user-friendly warning informing that the rendered data is not complete, in the user's language.
Previously, when a CSV file was larger than the limit, the render function lost its function to render the code. There were also multiple reads to the file, in order to determine its size and render or pre-render.
The warning: 
I think fall back render is still useful? Or we can view it as non-render mode?
I think fall back render is still useful? Or we can view it as non-render mode?
The file will always render as a table until one of the limits is reached. If the entire table cannot be displayed, users will receive a warning with a link to the non-rendered file.
Maybe a bit higher default will be good, I'm thinking maybe 5k or 10k lines. Is the UI still reasonable fast with 10k lines and let's say 10 columns?
Updated PR title. The linked issue https://github.com/go-gitea/gitea/pull/29663 seems incorrect. Is there a actually relevant issue for this?
It seems it's only 199 lines if you set MAX_ROWS = 200. One line was missed.
And the warning background color was also missed.
It seems it's only 199 lines if you set MAX_ROWS = 200. One line was missed. And the warning background color was also missed.
Background removal is intentional.
Updated PR title. The linked issue #29663 seems incorrect. Is there a actually relevant issue for this?
Initially, I started looking for the code intending to fix that specific problem. However, since it was resolved with another solution, I suppose we could define it as an enhancement. Should I open a new issue?
Maybe a bit higher default will be good, I'm thinking maybe 5k or 10k lines. Is the UI still reasonable fast with 10k lines and let's say 10 columns?
When the limit is 5k, seems to still have good performance!
Is it intentional that it renders both the message and the CSV?
If so, I would put the message on bottom where the user would expect more items.
Also I think we can increase to 10k or 20k easily. My browser renders 5k items with three columns in 600ms and hardware tends to become faster over time.
Checking GitHub I see that 60k lines still renders fine but 600k gives the error. So I suggest putting the default limit at either 100k lines or maybe 1MB file size.
Also, please link to raw URL instead of source URL, because a source with so many lines will be slow as well, so raw link is better.
Checking GitHub I see that 60k lines still renders fine but 600k gives the error. So I suggest putting the default limit at either 100k lines or maybe 1MB file size.
Also, please link to raw URL instead of source URL, because a source with so many lines will be slow as well, so raw link is better.
Sounds good! I applied the changes you suggest.
And please update the screenshot which is very useful when we write release blog when next release will be published.
I will do some more fine tunes on the UI, but I think we should closely follow what GitHub does and not render partial rows when the limit is exceeded, just render the message only in that case.
I will do some more fine tunes on the UI, but I think we should closely follow what GitHub does and not render partial rows when the limit is exceeded, just render the message only in that case.
I think it is a matter of preference. For example, GitLab always render and shows the message on the bottom of the code.
I will do some more fine tunes on the UI, but I think we should closely follow what GitHub does and not render partial rows when the limit is exceeded, just render the message only in that case.
I think it is a matter of preference. For example, GitLab always render and shows the message on the bottom of the code.
I find such behaviour inconsistent with code view where we also do no partial render when file is too large. And I think a partial render might potentially be misleading to the user thinking they see the whole file. So I say no to partial render :)
Thank you for the new changes, it looks much better.
In my mind, there are some nits, for example:
- variable naming (camelCase, but not
raw_link, the lint also fails due to it)- path escaping (maybe it should use
PathEscapeSegments)- it's better to add a test to ensure MaxRows really works (mock it to
MaxRows=2)(if you don't mind, I could also help to edit the PR to address these nits)
And since in the RC period, 1.22 only receives necessary backports, so I think this PR could be merged when 1.22 gets a stable release (just wait for some time)
(if you don't mind, I could also help to edit the PR to address these nits)
Yes please :)
