rails
rails copied to clipboard
enable custom Blob#key configuration (to ease Paperclip migration to ActiveStorage)
Hello Rails Team!
First of all I want to thank you for all job that you make here!!! Rails is a wonderfull framework to work with.
@georgeclaghorn Thank you for your answers in this issue that helped me to fine-tune this feature.
Summary
ActiveStorage custom storage path configuration
thoughtbot has made a wonderfull paperclip gem, but since it is depracated we are in the process of switching to ActiveStorage all our attachments management.
In Paperclip there is a fine feature where you can choose a specific storage path for each of the model attachable attribute and interpolate values in it as follows:
# app/models/user.rb
# in Paperclip
has_attached_file :avatar,
path: ':tenant/users/:user_id/:hash/:filename'
The idea of this PR is to make this possible in ActiveStorage too and have something like that:
# app/models/user.rb
# now in ActiveStorage
has_one_attached :avatar,
key: ':tenant/users/:id/:unique_secure_token'
NB: I have also added a test that makes sure the following change from the Rails 6.1 changelog is still working :
You can optionally provide a custom blob key when attaching a new file:
user.avatar.attach key: "avatars/#{user.id}.jpg",
io: io, content_type: "image/jpeg", filename: "avatar.jpg"
Active Storage will store the blob's data on the configured service at the provided key.
George Claghorn
Thanks for your consideration, I would be glad to contribute even more!
Other Information
Just for record, here is our business issue other tech companies might also have: We use the apartment gem to split our datas on a client basis and we want to make the same with uploaded files and store them in separate folders.
Thanks @danmcge for your review. I have made the requested changes.
- RuboCop should be good
- Tests should be good too
Great work @frantisekrokusekpa, I look forward to seeing this merged and released so I can finally complete my migration away from paperclip.
TODO: check compatibility with direct_uploads!
@frantisekrokusekpa I'm going to backport this to rails 5 and was wondering why have the global procs if you could just refer to instance methods? For instance:
This:
# app/models/user.rb
has_one_attached :avatar,
key: ':tenant/users/:record_id/avatar'
# with the following configuration
config.active_storage.key_interpolation_procs = {
tenant: ->(record, attachable) { Apartment::Tenant.current.parameterize },
record_id: ->(record, attachable) { record.id }
}
Could be:
# app/models/user.rb
has_one_attached :avatar,
key: ':tenant/users/:id/avatar'
def tenant
Apartment::Tenant.current.parameterize
end
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This is going to be awesome, can't wait. Question. Will it also work with direct uploads? Those are handled entirely in JS, right? Will it be affected by the new attribute on the has_attached directive?
This is going to be awesome, can't wait. Question. Will it also work with direct uploads? Those are handled entirely in JS, right? Will it be affected by the new attribute on the has_attached directive?
Yes that is one of the things I have to tackle too.
Hi, I think that this feature is handy, not only for easing paperclip migrations. In my company, we need to store files under a specific key to expose them on our custom CDN (you know, enterprise stuff).
I tested the changes against "main", and it is working fine, including the file uploads on the Trix Editor, so I guess that the direct uploads work fine too.
Because of some changes since you opened this PR, the example interpolations will not work because the "Blob" entity is created before the "record", but the tenant
example or any other attribute or relation available before creating the record will work.
@frantisekrokusekpa, are you considering continue with this PR, or can I take over?
Hi, I think that this feature is handy, not only for easing paperclip migrations. In my company, we need to store files under a specific key to expose them on our custom CDN (you know, enterprise stuff).
I tested the changes against "main", and it is working fine, including the file uploads on the Trix Editor, so I guess that the direct uploads work fine too.
Because of some changes since you opened this PR, the example interpolations will not work because the "Blob" entity is created before the "record", but the
tenant
example or any other attribute or relation available before creating the record will work.@frantisekrokusekpa, are you considering continue with this PR, or can I take over?
Thanks @ceritium for you motivation! I will take time this week to handle the direct uploads and let you help me with reviews and improvements if you still want to help ;-)
Because of some changes since you opened this PR, the example interpolations will not work because the "Blob" entity is created before the "record", but the tenant example or any other attribute or relation available before creating the record will work.
@ceritium We handled this by uploading the blob to a tmp location on the storage and then on Attachable creation/update, we move the file back to the right key/path. It at least works pretty well with S3 like storages!
Hi @ceritium I have tried
- to figure out where is the best place to put all the callbacks
- to set all the available service
#move
method to enable the custom Blob#key
I have managed to make this work with direct uploads and standard uploads, with S3 storage. Feel free to comment, propose some enhancements!
Hi @frantisekrokusekpa, sorry for the delay.
Looks like there is a syntax error on attached/many_test.rb
, a missing end
.
I don't have enough experience with the storage providers and I don't know how the "copy/move" operations work under the hood. If the client has to download and re-upload the file it could be a problem, it will hit the performance and the wallet. Do you know how it works?
I don't have enough experience with the storage providers and I don't know how the "copy/move" operations work under the hood. If the client has to download and re-upload the file it could be a problem, it will hit the performance and the wallet. Do you know how it works?
Actually it will not hit download and upload, it's internal to S3. It should not have huge amount on the wallet neither.
it's internal to S3
Not only for S3, the same for the other providers.
Actually it will not hit download and upload
if you are sure, then it's ok.
Let's see if someone from the core team is interested in the PR. I can't do anything else than support your PR 👍 (once you fix the syntax error)
Hi @frantisekrokusekpa, I was taking another look at the code; because the tests were failing, I opened another PR to your branch (you can merge or copy and paste if you like, as you prefer).
Other tests like: "override default custom storage key" would be a bit more complex to fix; maybe we do not even need them. What do you think?
Hi @frantisekrokusekpa, I was taking another look at the code; because the tests were failing, I opened another PR to your branch (you can merge or copy and paste if you like, as you prefer).
Thanks, merged ;)
Great idea, I'm looking forward to having this merged.
Will it take the same custom path for the variants images? This would be great and it would be the same behaviour as paperclip gem.
Great idea, I'm looking forward to having this merged.
Will it take the same custom path for the variants images? This would be great and it would be the same behaviour as paperclip gem.
Thanks for your support. I haven't actually checked the variants... I add this to my to-do list!
@georgeclaghorn what can I do more to make this move forward?
when it will be merged?
Just checking on when this will be merged? Thanks! (and to prevent stale activity)
Hi, is there a documentation for this feature? i could find nothing in the docs
my app: rails 7.0.3
user.rb
has_one_attached :avatar, key: 'avatars/:id/avatar'
=> returns: ArgumentError (unknown keyword: :key):
Hi everyone,
We had a similar issue here and we solved it differently, let me explain why and how, and how it could impact this PR.
We're not using the Appartment gem, but awesome work led by @eileencodes on Rails 7 regarding multiple databases (so many thanks for this work, every contributor :heart:). Specifically, we're using a ShardSelector
to configure ActiveRecord::Base.current_shard
per request based on the hostname, and via ApplicationRecord#connects_to
). But it looks like the same use case that motivated this PR.
Regarding ActiveStorage
, we're using ActiveStorage::Service::DiskService
to persist assets, and we want to partition stored assets in subdirectories based on the current tenant (which is a feature that was ruled out by @georgeclaghorn in https://github.com/rails/rails/issues/32790), allowing us to easily backup/export a given tenant assets (in turn allowing our customers to easily opt-out of our multi-tenant product).
Unfortunately, ActiveStorage::Service::DiskService
has a sharding feature implemented by ActiveStorage::Service::DiskService#folder_for
(introduced in https://github.com/rails/rails/commit/d30231e983c019fbe3fcd6a2a06a461fefa58dab). It pretty much partitions stored assets in 2 levels of sub-directories based on the 4 first characters of the asset key. Though the commit doesn't explain why it was done, I guess it was a way to circumvent the maximum number of files that a directory can contain on some filesystems.
As an example, this sharding feature would store assets like that (my root
is public/uploads/
):
$ find public/uploads/
public/uploads/
public/uploads/2g
public/uploads/2g/wq
public/uploads/2g/wq/2gwqp36t5tvo91uiquylx1uku9qo
public/uploads/2g/wq/2gwqau4p7mvuj2nduc8462hnrux2
public/uploads/be
public/uploads/be/1h
public/uploads/be/1h/be1hg07e8irpxyd725q7jdlj4x2r
public/uploads/vi
public/uploads/vi/5j
public/uploads/vi/5j/vi5juy9fnt0kb7z6ogyga9ua2ims
…
In this case, prepending the tenant to the asset key, would completely deceive this sharding feature, because the assets described above would then be stored like this (assuming tenants named tenant
and another_tenant
):
$ find public/uploads/
public/uploads/
public/uploads/te
public/uploads/te/na
public/uploads/te/na/tenant-2gwqp36t5tvo91uiquylx1uku9qo
public/uploads/te/na/tenant-2gwqau4p7mvuj2nduc8462hnrux2
public/uploads/te/na/tenant-be1hg07e8irpxyd725q7jdlj4x2r
public/uploads/
public/uploads/an/ot
public/uploads/an/ot/another_tenant-vi5juy9fnt0kb7z6ogyga9ua2ims
…
As you can see, the sharding partition is now determined by the tenant name, which provides a far less efficient distribution than when determined by the key (which is randomly generated by ActiveStorage::Blob.generate_unique_secure_token
). By "far less", I actually mean "pretty much broken".
By the way, an approach to workaround this issue would be to append the tenant rather than prepending it, but it would fail at solving our initial goal: partitioning stored assets by tenant.
Sorry for this long introduction, but that's necessary to understand why we did not monkeypatch the key generation, but rather ended up monkeypatching ActiveStorage::Service::DiskService#path_for
, like that (and we're aware that it works only while we keep persisting assets on disk :cry:):
# config/initializers/active_storage_multitenancy.rb
Rails.application.config.to_prepare do
require "active_storage/service/disk_service"
ActiveStorage::Service::DiskService.class_eval do
def path_for(key)
File.join root, ActiveRecord::Base.current_shard.to_s, folder_for(key), key
end
end
end
Regarding this PR (I'm trying to be constructive here, not just to brag about my solution :sweat_smile:), allowing users to define custom key schemes like proposed may break ActiveStorage::Service::DiskService
sharding feature, thus we should either:
- warn users when they define a custom key
- document this behaviour
- forbid them to define custom keys (sorry, that would mean not merging this PR, which is probably the reason why @georgeclaghorn ruled in out in https://github.com/rails/rails/issues/32790)
- remove
ActiveStorage::Service::DiskService
sharding feature - introduce a
sharding_prefix
option that could be used/ignored by different persistence services
What do you all think?
Warning users || introducing a sharding_prefix
option sounds good!
Hey @frantisekrokusekpa ! any update on this PR?
Great work @frantisekrokusekpa, It will be great to see this merged and released so I can finally complete my migration away from paperclips. When this will be merged? Thanks!
I'm starting to hold my breath to see this merged!
Any news on this PR being merged? It's one of the reasons I'm still using Carrierwave for storage
I think there is some unresolved feedback here: https://github.com/rails/rails/pull/41004#pullrequestreview-990279074
Also this PR needs to be rebased and commits squashed when they are ready.
Given the age though, we might want to consider open a new PR and give credit to @frantisekrokusekpa as a co-author.