alaveteli icon indicating copy to clipboard operation
alaveteli copied to clipboard

Enable searching by attachment file size

Open RichardTaylor opened this issue 2 years ago • 4 comments

Specific issue spun out from https://github.com/mysociety/alaveteli/issues/2663

Number one would be the ability to search via filesize.

RichardTaylor avatar Mar 22 '22 12:03 RichardTaylor

+1 this might help to identify ‘problem’ cases much more simply.

mdeuk avatar Mar 22 '22 14:03 mdeuk

+1 For the reasons above. Also this would make it easier to find interesting releases eg large datasets.

FOIMonkey avatar Mar 23 '22 21:03 FOIMonkey

More possible after https://github.com/mysociety/alaveteli/issues/6532.

garethrees avatar Mar 24 '22 09:03 garethrees

We can replace our custom FoiAttachment#display_size calculation by delegating to the value calculated by Active Storage, and use number_to_human_size to render it out with the correct quantifier.

helper.number_to_human_size(FoiAttachment.last.file.blob.byte_size)
# => "15 Bytes"

Annoyingly, we only index InfoRequestEvent; not FoiAttachment directly.

For response events we concatenate a space-separated list of attachment file extensions to give to Xapian. This allows us to do a simple presence search.

https://github.com/mysociety/alaveteli/blob/fa6f8d63913f09c5b5ccc31654cf66718ac65abf/app/models/info_request_event.rb#L270

We can't easily do this for file sizes, as we'd want to use a range search. If a response had three attachments, the indexed value might look like "15 78127 341". A range search for 100..500 wouldn't identify the third attachment because Xapian wouldn't know that we're representing three records.

Changing our indexing strategy is a huge task.

I think we could solve our immediate need by indexing largest_attachment on InfoRequestEvent. That way we could do a search like filetype:xls AND largest_attachment:1000000.. – i.e. ".xls' greater than 1MB".

So to summarise:

  1. Tidy up our custom display_size calculation to use what Active Storage gives us
  2. Index largest_attachment on InfoRequestEvent
  3. Update the /advanced_search help page with an example of the new attribute
  4. Consider whether we want a rake task to initiate a background reindex of related events

garethrees avatar Sep 21 '22 13:09 garethrees