Brief overview

AS A data explorer

I WANT to be able to filter the rows by a value in any column

SO THAT I can quickly find the data I'm looking for based on value in a specific column

Additional details

Advanced use case: Filter on multiple columns

Aug 22 '23 15:08 syedmoizurrehman

Many thanks - this is definitely planned. Big tables might need an index to be added though.

I wonder what the UI for this should look like though. If you have any ideas, let me know!

Aug 23 '23 08:08 calpaterson

@calpaterson You're welcome! For the UI, I think we can look at mainstream applications such as Google Sheets, Microsoft Excel for inspiration.

Examples from Google Sheet:

Right-click on a header to apply a filter:
A filter icon appears with the header once filter is applied, click the icon to configure the filter:
Filter applied:

If we want to avoid the right-click context menu, we can use a filter icon that appears on hover. Clicking on it will show the configuration:

If user selects a value, filter is applied.
If user erases the value, filter is removed.

We can probably work around the UX to make it better but that's the general idea.

Some more ideas:

Aug 24 '23 07:08 syedmoizurrehman

Thanks! These are useful ideas!

One thing that I think I'd like is to make sure that the filtered data comes with an unique url so that it's easier to share. Something like

https://csvbase.com/meripaterson/stock-exchanges/filtered?Country=United%20States%20Of%20America

But obviously I also need to work in the ability to choose an operator into that because not every filter is =. And also make sure the url doesn't get too long.

MUI's filtering UI looks like it might be suitable and I might also combine that with a general purpose "filter/search" box at the top of every table so you can see what filters are currently applied. That would also allow for popping it out to show it in more detail, with more form controls and so on so you can change the operator or the value, etc.

Aug 25 '23 15:08 calpaterson

I agree, MUI's filtering UI seems to be a good choice among others.

The unique filter URL is indeed a very convenient feature! To avoid very long filter URLs, we can chose some encoding to represent the header(s) and operator(s), however, there's nothing stopping a user from entering a ridiculously large filter value, and there might be actual use cases where the value needs to be hundreds/thousands of characters long.

Instead of using an encoding, it might be a better idea to have a database storage for filter links, and using an opaque ID in the URL to refer to the filter. E.g. https://csvbase.com/meripaterson/stock-exchanges/filtered?Country=United%20States%20Of%20America would become https://csvbase.com/meripaterson/stock-exchanges/filter=<filter_id>, and the program can fetch the filter details using the filter_id which will contain the filter details: header="Country", operator="EQUALS", value="United States Of America". This can scale easily for multiple filter/search conditions as well while avoiding the large URL problem.

One potential UX issue I see with this approach is that a user won't be able to "guess" the filter URL. Example use case: A developer who is writing a program to consume a table and they already have its filtered URL. If they want to modify the filter from United%20States%20Of%20America to USA, they won't be able to just update the URL in their program's source code. They will need to visit the website in the browser, update the filter, copy the new URL and replace the previous URL with the new URL in the source code. Although, I think this is a minor inconvenience, not a significant tradeoff.

Aug 26 '23 02:08 syedmoizurrehman

Ack, yes, that is a good point about using the URL for the filter. It looks like Chrome limits URLs to about 2k characters. Besides making them difficult to guess as you say, the other trade-off with using database storage is that I have to write a row for each click someone makes. That would end up quite quickly being a big table, especially if a crawler goes through and clicks on loads of filter links.

My temptation is to allow filters under 2k characters than then start to encode them after that...

The other thought that comes to mind: how many cases are there where the sum total of the filters comes to >2k characters? Should I really accept filters that large or only allow a prefix of the whole string to be used?

Aug 28 '23 16:08 calpaterson

That's a good point, the data can end up becoming huge quickly. Another idea is to allow only one filter at a time on one table. Considering Google Sheets as an example, if a user applies a filter on a sheet, the filter is persisted and any other user who opens the sheet finds the filter applied automatically. Similarly, when the filter is removed by one user, any other user opening the link no longer sees the filter applied. This seems to be the way Airtable works as well. I think this is a simple approach, at least as a starting point. We can always allow multiple filters at the same time on one table in future should the users demand it.

This also means that the table URL won't change when applying the filter, the same URL will now point to the filtered table. This has the advantage that a user won't have to re-share the link after applying a filter if they had already shared the unfiltered table link prior to applying the filter. The con here is that any program consuming the unfiltered table will be impacted by the filter, though this can be mitigated by allowing the user to create a new table with filtered data only.

The other thought that comes to mind: how many cases are there where the sum total of the filters comes to >2k characters? Should I really accept filters that large or only allow a prefix of the whole string to be used?

Bioinformatics is one use case in my mind where a single column can have values consisting of large number of characters (e.g. biomolecules sequences), although I'm not sure if this would amount to thousands of characters. Even if it doesn't, a combination of filters on such data is likely to exceed the 2K characters limit.

Aug 30 '23 03:08 syedmoizurrehman

I've been thinking about this a bit more because it's coming up on the horizon...

Here's the basic principles I want to follow:

Filtering should be "stateless", ie that the filter is encoded in the url query params.

I think I can understand why airtable allows for users to set filters on the whole table that everyone sees but csvbase is aimed at public data and it would be weird if one user could change the view of a table for another. There is also the matter that for csvbase the HTML is just one view of the table - there is also the API - and it would be too surprising for the API output to be altered by filters a user had put on via the web. That's not to say that I am ruling out saved filters, just that that's a different feature that might be done later.

Filters should be discoverable, so that the query params will be de facto human editable, eg ?Country=France

This will help people get started.

The current filtering needs to be migrated. eg https://csvbase.com/meripaterson/stock-exchanges?op=gt&n=10 needs to keep working. I'm going to keep supporting that url pattern even as I change to a new mechanism
(Bit of a repetition) The same urls need to work in the API. If someone wants a filtered csv file, they will be able to use the same url as they get from the website.
Complicated and unusual cases where the filter is going to be very long will, for now, not be possible via the HTML but of course you can do what you want with an export. I'm open to revisiting this in the future if it's a common usecase.

It's a bit weird to think about, but csvbase already has a (very basic) implementation of filtering. When you click "Next" on a table you get ?op=gt&n=10 which basically means csvbase_row_id>10. But it only exists for row id.

What I'm going to move to is the following system:

?Country=France for equality filters

?csvbase_row_id=gt:10 for </>

The first step is an internal rejig to the KeySet object, which is what I use for filtering internally. There's actually a few things to do there. Once I have that done, I will start to expose that via the API, keeping compatibility with the existing url pattern, which shouldn't be so hard.

Sep 06 '23 08:09 calpaterson

Point 1 is interesting. Indeed, for public data where anyone with link to a table can apply filters, it doesn't feel right to allow the filter to impact the link everybody else has.

Regarding point 3: Thanks for pointing that out, I didn't notice that there's already a URL filtering format for equality. It can definitely be extended for other key-value pairs.

I agree with the rest of the points. We can put a hard limit on the number of characters for a filter for now and see whether there is demand for longer filters.

Sep 07 '23 03:09 syedmoizurrehman

Regarding point 3: Thanks for pointing that out, I didn't notice that there's already a URL filtering format for equality. It can definitely be extended for other key-value pairs.

Er....I didn't either until I came to look closely at how this would be implemented :)

Sep 07 '23 19:09 calpaterson

Allow users to filter the rows by value in any column

Brief overview

Additional details