alaveteli
alaveteli copied to clipboard
Enable searching by attachment file size
Specific issue spun out from https://github.com/mysociety/alaveteli/issues/2663
Number one would be the ability to search via filesize.
+1 this might help to identify ‘problem’ cases much more simply.
+1 For the reasons above. Also this would make it easier to find interesting releases eg large datasets.
More possible after https://github.com/mysociety/alaveteli/issues/6532.
We can replace our custom FoiAttachment#display_size
calculation by delegating to the value calculated by Active Storage, and use number_to_human_size
to render it out with the correct quantifier.
helper.number_to_human_size(FoiAttachment.last.file.blob.byte_size)
# => "15 Bytes"
Annoyingly, we only index InfoRequestEvent
; not FoiAttachment
directly.
For response events we concatenate a space-separated list of attachment file extensions to give to Xapian. This allows us to do a simple presence search.
https://github.com/mysociety/alaveteli/blob/fa6f8d63913f09c5b5ccc31654cf66718ac65abf/app/models/info_request_event.rb#L270
We can't easily do this for file sizes, as we'd want to use a range search. If a response had three attachments, the indexed value might look like "15 78127 341
". A range search for 100..500
wouldn't identify the third attachment because Xapian wouldn't know that we're representing three records.
Changing our indexing strategy is a huge task.
I think we could solve our immediate need by indexing largest_attachment
on InfoRequestEvent
. That way we could do a search like filetype:xls AND largest_attachment:1000000..
– i.e. ".xls' greater than 1MB".
So to summarise:
- Tidy up our custom display_size calculation to use what Active Storage gives us
- Index
largest_attachment
onInfoRequestEvent
- Update the
/advanced_search
help page with an example of the new attribute - Consider whether we want a rake task to initiate a background reindex of related events