rails icon indicating copy to clipboard operation
rails copied to clipboard

enable custom Blob#key configuration (to ease Paperclip migration to ActiveStorage)

Open frantisekrokusekpa opened this issue 3 years ago • 30 comments

Hello Rails Team!

First of all I want to thank you for all job that you make here!!! Rails is a wonderfull framework to work with.

@georgeclaghorn Thank you for your answers in this issue that helped me to fine-tune this feature.

Summary

ActiveStorage custom storage path configuration

thoughtbot has made a wonderfull paperclip gem, but since it is depracated we are in the process of switching to ActiveStorage all our attachments management.

In Paperclip there is a fine feature where you can choose a specific storage path for each of the model attachable attribute and interpolate values in it as follows:

# app/models/user.rb

# in Paperclip
has_attached_file :avatar,
                  path: ':tenant/users/:user_id/:hash/:filename'

The idea of this PR is to make this possible in ActiveStorage too and have something like that:

# app/models/user.rb

# now in ActiveStorage
has_one_attached :avatar,
                 key: ':tenant/users/:id/:unique_secure_token'

NB: I have also added a test that makes sure the following change from the Rails 6.1 changelog is still working :

You can optionally provide a custom blob key when attaching a new file:

user.avatar.attach key: "avatars/#{user.id}.jpg",
  io: io, content_type: "image/jpeg", filename: "avatar.jpg"
Active Storage will store the blob's data on the configured service at the provided key.

George Claghorn

Thanks for your consideration, I would be glad to contribute even more!

Other Information

Just for record, here is our business issue other tech companies might also have: We use the apartment gem to split our datas on a client basis and we want to make the same with uploaded files and store them in separate folders.

frantisekrokusekpa avatar Jan 03 '21 17:01 frantisekrokusekpa

Thanks @danmcge for your review. I have made the requested changes.

  • RuboCop should be good
  • Tests should be good too

frantisekrokusekpa avatar Jan 04 '21 08:01 frantisekrokusekpa

Great work @frantisekrokusekpa, I look forward to seeing this merged and released so I can finally complete my migration away from paperclip.

zavan avatar Jan 06 '21 14:01 zavan

TODO: check compatibility with direct_uploads!

frantisekrokusekpa avatar Jan 22 '21 10:01 frantisekrokusekpa

@frantisekrokusekpa I'm going to backport this to rails 5 and was wondering why have the global procs if you could just refer to instance methods? For instance:

This:

    # app/models/user.rb
    has_one_attached :avatar,
                     key: ':tenant/users/:record_id/avatar'
    # with the following configuration
    config.active_storage.key_interpolation_procs = {
      tenant:    ->(record, attachable) { Apartment::Tenant.current.parameterize },
      record_id: ->(record, attachable) { record.id }
    }

Could be:

    # app/models/user.rb
    has_one_attached :avatar,
                     key: ':tenant/users/:id/avatar'

    def tenant
      Apartment::Tenant.current.parameterize
    end

xjunior avatar Feb 02 '21 03:02 xjunior

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rails-bot[bot] avatar May 03 '21 03:05 rails-bot[bot]

This is going to be awesome, can't wait. Question. Will it also work with direct uploads? Those are handled entirely in JS, right? Will it be affected by the new attribute on the has_attached directive?

jasonperrone avatar Jul 16 '21 19:07 jasonperrone

This is going to be awesome, can't wait. Question. Will it also work with direct uploads? Those are handled entirely in JS, right? Will it be affected by the new attribute on the has_attached directive?

Yes that is one of the things I have to tackle too.

frantisekrokusekpa avatar Jul 19 '21 07:07 frantisekrokusekpa

Hi, I think that this feature is handy, not only for easing paperclip migrations. In my company, we need to store files under a specific key to expose them on our custom CDN (you know, enterprise stuff).

I tested the changes against "main", and it is working fine, including the file uploads on the Trix Editor, so I guess that the direct uploads work fine too.

Because of some changes since you opened this PR, the example interpolations will not work because the "Blob" entity is created before the "record", but the tenant example or any other attribute or relation available before creating the record will work.

@frantisekrokusekpa, are you considering continue with this PR, or can I take over?

ceritium avatar Sep 17 '21 15:09 ceritium

Hi, I think that this feature is handy, not only for easing paperclip migrations. In my company, we need to store files under a specific key to expose them on our custom CDN (you know, enterprise stuff).

I tested the changes against "main", and it is working fine, including the file uploads on the Trix Editor, so I guess that the direct uploads work fine too.

Because of some changes since you opened this PR, the example interpolations will not work because the "Blob" entity is created before the "record", but the tenant example or any other attribute or relation available before creating the record will work.

@frantisekrokusekpa, are you considering continue with this PR, or can I take over?

Thanks @ceritium for you motivation! I will take time this week to handle the direct uploads and let you help me with reviews and improvements if you still want to help ;-)

frantisekrokusekpa avatar Sep 19 '21 19:09 frantisekrokusekpa

Because of some changes since you opened this PR, the example interpolations will not work because the "Blob" entity is created before the "record", but the tenant example or any other attribute or relation available before creating the record will work.

@ceritium We handled this by uploading the blob to a tmp location on the storage and then on Attachable creation/update, we move the file back to the right key/path. It at least works pretty well with S3 like storages!

frantisekrokusekpa avatar Sep 19 '21 19:09 frantisekrokusekpa

Hi @ceritium I have tried

  • to figure out where is the best place to put all the callbacks
  • to set all the available service #movemethod to enable the custom Blob#key

I have managed to make this work with direct uploads and standard uploads, with S3 storage. Feel free to comment, propose some enhancements!

frantisekrokusekpa avatar Sep 26 '21 21:09 frantisekrokusekpa

Hi @frantisekrokusekpa, sorry for the delay.

Looks like there is a syntax error on attached/many_test.rb, a missing end.

I don't have enough experience with the storage providers and I don't know how the "copy/move" operations work under the hood. If the client has to download and re-upload the file it could be a problem, it will hit the performance and the wallet. Do you know how it works?

ceritium avatar Oct 04 '21 13:10 ceritium

I don't have enough experience with the storage providers and I don't know how the "copy/move" operations work under the hood. If the client has to download and re-upload the file it could be a problem, it will hit the performance and the wallet. Do you know how it works?

Actually it will not hit download and upload, it's internal to S3. It should not have huge amount on the wallet neither.

frantisekrokusekpa avatar Oct 04 '21 13:10 frantisekrokusekpa

it's internal to S3

Not only for S3, the same for the other providers.

Actually it will not hit download and upload

if you are sure, then it's ok.

Let's see if someone from the core team is interested in the PR. I can't do anything else than support your PR 👍 (once you fix the syntax error)

ceritium avatar Oct 04 '21 13:10 ceritium

Hi @frantisekrokusekpa, I was taking another look at the code; because the tests were failing, I opened another PR to your branch (you can merge or copy and paste if you like, as you prefer).

Other tests like: "override default custom storage key" would be a bit more complex to fix; maybe we do not even need them. What do you think?

ceritium avatar Oct 15 '21 12:10 ceritium

Hi @frantisekrokusekpa, I was taking another look at the code; because the tests were failing, I opened another PR to your branch (you can merge or copy and paste if you like, as you prefer).

Thanks, merged ;)

frantisekrokusekpa avatar Oct 15 '21 13:10 frantisekrokusekpa

Great idea, I'm looking forward to having this merged.

Will it take the same custom path for the variants images? This would be great and it would be the same behaviour as paperclip gem.

antprt avatar Oct 18 '21 06:10 antprt

Great idea, I'm looking forward to having this merged.

Will it take the same custom path for the variants images? This would be great and it would be the same behaviour as paperclip gem.

Thanks for your support. I haven't actually checked the variants... I add this to my to-do list!

frantisekrokusekpa avatar Oct 18 '21 07:10 frantisekrokusekpa

@georgeclaghorn what can I do more to make this move forward?

frantisekrokusekpa avatar Nov 26 '21 16:11 frantisekrokusekpa

when it will be merged?

coins-black avatar Jan 21 '22 07:01 coins-black

Just checking on when this will be merged? Thanks! (and to prevent stale activity)

jeffktan avatar Feb 25 '22 21:02 jeffktan

Hi, is there a documentation for this feature? i could find nothing in the docs

my app: rails 7.0.3

user.rb

has_one_attached :avatar, key: 'avatars/:id/avatar'

=> returns: ArgumentError (unknown keyword: :key):

chmich avatar Jun 04 '22 07:06 chmich

Hi everyone,

We had a similar issue here and we solved it differently, let me explain why and how, and how it could impact this PR.

We're not using the Appartment gem, but awesome work led by @eileencodes on Rails 7 regarding multiple databases (so many thanks for this work, every contributor :heart:). Specifically, we're using a ShardSelector to configure ActiveRecord::Base.current_shard per request based on the hostname, and via ApplicationRecord#connects_to). But it looks like the same use case that motivated this PR.

Regarding ActiveStorage, we're using ActiveStorage::Service::DiskService to persist assets, and we want to partition stored assets in subdirectories based on the current tenant (which is a feature that was ruled out by @georgeclaghorn in https://github.com/rails/rails/issues/32790), allowing us to easily backup/export a given tenant assets (in turn allowing our customers to easily opt-out of our multi-tenant product).

Unfortunately, ActiveStorage::Service::DiskService has a sharding feature implemented by ActiveStorage::Service::DiskService#folder_for (introduced in https://github.com/rails/rails/commit/d30231e983c019fbe3fcd6a2a06a461fefa58dab). It pretty much partitions stored assets in 2 levels of sub-directories based on the 4 first characters of the asset key. Though the commit doesn't explain why it was done, I guess it was a way to circumvent the maximum number of files that a directory can contain on some filesystems.

As an example, this sharding feature would store assets like that (my root is public/uploads/):

$ find public/uploads/
public/uploads/
public/uploads/2g
public/uploads/2g/wq
public/uploads/2g/wq/2gwqp36t5tvo91uiquylx1uku9qo
public/uploads/2g/wq/2gwqau4p7mvuj2nduc8462hnrux2
public/uploads/be
public/uploads/be/1h
public/uploads/be/1h/be1hg07e8irpxyd725q7jdlj4x2r
public/uploads/vi
public/uploads/vi/5j
public/uploads/vi/5j/vi5juy9fnt0kb7z6ogyga9ua2ims
…

In this case, prepending the tenant to the asset key, would completely deceive this sharding feature, because the assets described above would then be stored like this (assuming tenants named tenant and another_tenant):

$ find public/uploads/
public/uploads/
public/uploads/te
public/uploads/te/na
public/uploads/te/na/tenant-2gwqp36t5tvo91uiquylx1uku9qo
public/uploads/te/na/tenant-2gwqau4p7mvuj2nduc8462hnrux2
public/uploads/te/na/tenant-be1hg07e8irpxyd725q7jdlj4x2r
public/uploads/
public/uploads/an/ot
public/uploads/an/ot/another_tenant-vi5juy9fnt0kb7z6ogyga9ua2ims
…

As you can see, the sharding partition is now determined by the tenant name, which provides a far less efficient distribution than when determined by the key (which is randomly generated by ActiveStorage::Blob.generate_unique_secure_token). By "far less", I actually mean "pretty much broken".

By the way, an approach to workaround this issue would be to append the tenant rather than prepending it, but it would fail at solving our initial goal: partitioning stored assets by tenant.

Sorry for this long introduction, but that's necessary to understand why we did not monkeypatch the key generation, but rather ended up monkeypatching ActiveStorage::Service::DiskService#path_for, like that (and we're aware that it works only while we keep persisting assets on disk :cry:):

# config/initializers/active_storage_multitenancy.rb

Rails.application.config.to_prepare do
  require "active_storage/service/disk_service"

  ActiveStorage::Service::DiskService.class_eval do
    def path_for(key)
      File.join root, ActiveRecord::Base.current_shard.to_s, folder_for(key), key
    end
  end
end

Regarding this PR (I'm trying to be constructive here, not just to brag about my solution :sweat_smile:), allowing users to define custom key schemes like proposed may break ActiveStorage::Service::DiskService sharding feature, thus we should either:

  • warn users when they define a custom key
  • document this behaviour
  • forbid them to define custom keys (sorry, that would mean not merging this PR, which is probably the reason why @georgeclaghorn ruled in out in https://github.com/rails/rails/issues/32790)
  • remove ActiveStorage::Service::DiskService sharding feature
  • introduce a sharding_prefix option that could be used/ignored by different persistence services

What do you all think?

michaelbaudino avatar Jul 28 '22 09:07 michaelbaudino

Warning users || introducing a sharding_prefix option sounds good!

joedicator avatar Jul 30 '22 01:07 joedicator

Hey @frantisekrokusekpa ! any update on this PR?

imanpalsingh avatar Aug 10 '22 07:08 imanpalsingh

Great work @frantisekrokusekpa, It will be great to see this merged and released so I can finally complete my migration away from paperclips. When this will be merged? Thanks!

krishnasingh001 avatar Dec 19 '22 10:12 krishnasingh001

I'm starting to hold my breath to see this merged!

pre avatar Feb 28 '23 08:02 pre

Any news on this PR being merged? It's one of the reasons I'm still using Carrierwave for storage

deanpcmad avatar Mar 06 '23 23:03 deanpcmad

I think there is some unresolved feedback here: https://github.com/rails/rails/pull/41004#pullrequestreview-990279074

Also this PR needs to be rebased and commits squashed when they are ready.

Given the age though, we might want to consider open a new PR and give credit to @frantisekrokusekpa as a co-author.

zzak avatar Mar 27 '23 06:03 zzak