dash-uploader icon indicating copy to clipboard operation
dash-uploader copied to clipboard

Integrate with heroku

Open MingBit opened this issue 4 years ago • 7 comments

Hey! Me again! :-) I am trying to publish my dashboard to heroku, unfortunately I couldn't upload the object onto web (https://dash-pyscnet.herokuapp.com/)... It was perfectly working locally.

So i am wondering have you also tested with heroku?! ;-)

You can find the code here

Thank you very much and look forward to your response. :+1:

MingBit avatar Oct 29 '20 12:10 MingBit

Hi,

I have not myself ever tried Heroku, but it would nice to know if Heroku has some limitations with regarding to dash-uploader. Also someone else just asked about Heroku here.

The dash-uploader works by

  1. Configuring the server to handle HTTP POST requests to a certain URL. Usually something like http://myapplication.com/API/resumable
  2. In the Javascript side, by sending the file to be uploaded piece by piece to this address.

When the server receives the packets, it saves them one by one to temporary files, and then creates coerces them into one file (the uploaded file). Therefore, the first two guesses that where something could go wrong are

  1. Javascript is sending to some address which server is not listening to
  2. Your app does not have permission to write on disk.

You can test them both. To test (1), you can use the Chrome developer tools (press F12 on Chrome browser) and check how the HTTP responses are handled. In the Network tab, search for a line with http://myapplication.com/API/resumable?lots-of-text-here. What is the status of this HTTP POST request?

(2) Write a simple script that writes on the disk. For example, just when the app loads. Log the process. Also test that the written file is really there. Does it work?


EDIT:

Just tested (1) on your app. Seems that somehow the server does not know how to handle the POST request pointed to

https://dash-pyscnet.herokuapp.com/API/resumable?resumableChunkNumber=1&resumableChunkSize=1048576&resumableCurrentChunkSize=4&resumableTotalSize=4&resumableType=&resumableIdentifier=4-testpk&resumableFilename=test.pk&resumableRelativePath=test.pk&resumableTotalChunks=1&upload_id=178b7768-19f4-11eb-b84b-3e51d0412b25

It gives 500 INTERNAL SERVER ERROR as status code.

I think the next thing to do would be to make a minimum application (dash, Flask, or anything else) which should be able to handle HTTP POST requests. If you can make that kind of app work on Heroku, it would be one step closer to make the dash-uploader to work there, too.

fohrloop avatar Oct 29 '20 15:10 fohrloop

The starting point for debugging could be a simple Flask app like this

from flask import Flask
from flask import request
import os

app = Flask(__name__)

@app.route("/", methods=['GET', 'POST'])
def mytest():
    if request.method == 'POST':
        return "POST success"
    elif request.method == 'GET':
        return "GET success"
   

if __name__ == "__main__":
    port=int(os.environ.get('PORT', 33507))
    app.run(host='0.0.0.0', port=port)

Then, try to hit the app with GET and POST requests:

import json, requests

r = requests.get('https://dash-pyscnet.herokuapp.com:33507')  

payload = {'somekey': 'somevalue'}
r = requests.post('https://dash-pyscnet.herokuapp.com:33507', data=json.dumps(payload))

If it works, try to take the port out from the request. Does it work then? Then, try to configure the route to /API/resumable. Does it work then?

fohrloop avatar Oct 29 '20 15:10 fohrloop

Hey!

I just tested the simple flask app which shows the request method is 'GET'.. but no response from r = requests.get('https://dash-pyscnet.herokuapp.com:33507')

i run the simple flask app first and open the python terminal to run the following:

import  json, requests
r = requests.get('https://dash-pyscnet.herokuapp.com:33507')

Correct me if i am doing anything inappropriate. :-)

MingBit avatar Oct 30 '20 09:10 MingBit

Yeah that's a good starting point, to try to get a response for a simple HTTP GET request. What port are you running your flask app? Did you check that this port is open in Heroku?

fohrloop avatar Oct 30 '20 09:10 fohrloop

Cool.. So i just set the flask app port as '8080' which is the same for dash-pyscnet in heroku. I need to have a look how to check/modify the port in heroku.. dash-pyscnet app:

if __name__ == '__main__':
    app.run_server(host='127.0.0.1', port='8080', debug=True,
                   dev_tools_ui=False, dev_tools_props_check=False)

flask app:

if __name__ == "__main__":
    port=int(os.environ.get('PORT', 8080))
    app.run(host='0.0.0.0', port=port)

Test via python terminal:

import  json, requests
r = requests.get('https://dash-pyscnet.herokuapp.com:8080')

MingBit avatar Oct 30 '20 09:10 MingBit

For the sake of discussion, remember heroku's ephemeral filesystem which may not be welcomed for dash-uploader mechanics, let me quote some here:

Heroku has an “ephemeral” hard drive, this means that you can write files to disk, but those files will not persist after the application is restarted. By default Active Storage uses a :local storage option, which uses the local file system to store any uploaded files. While file uploads that are stored with the :local option will appear to work at first, the attachments will exhibit seemingly strange behavior and eventually disappear. The files will go away when the app is deployed, or when it is automatically restarted (once every 24 hours). If the app has multiple dynos, not all files will be present on every dyno. This means that the dyno that serves a web request, might be different than a dyno that contains a specific uploaded file. For example if you have two dynos, and upload a file, it will only be present on one dyno. When you refresh the webpage, there will be a 50% chance that the web request will be routed to the dyno with the file, and a 50% chance it will appear to be broken. In addition, any files stored on disk will not be visible from one-off dynos such as a heroku run bash instance or a scheduler task because these commands use new dynos.

Instead of storing uploaded files to disk, the best practice is to leverage a cloud file storage service such as Amazon’s S3.

https://devcenter.heroku.com/articles/active-storage-on-heroku

maulberto3 avatar Oct 31 '20 05:10 maulberto3

Thank you @maulberto3 for the information about Heroku filesystem. I will try to explain a little more how dash-uploader works and what it means if one wants upload files to Heroku, or other similar services.

dash-uploader (0.4.1) is very simple. It defines one Flask route in configure_upload.py for the upload component. By default this is something like http://myapplication.com/API/resumable, and both GET and POST requests are handled. The data is send in POST requests and the GET request is sometimes(?) used by some sort of checking1

The upload logic

  • Browser sends HTTP POST to /API/resumable, one for each chunck of data
  • Flask route takes in the HTTP POST request and handles it. It saves the chunk as temporary file to filesystem.
  • When all the chunks are uploaded2, a new file is created and all the chunk file contents are read and merged to this one file.3

Problems with Heroku

There are some reasons why Heroku is not considered a good option for apps that rely on the file system

1) The files are not persistent

Heroku restarts the servers ("dynos") at least every 24 hours. I don't know if some restarting might happen in other special cases, too. This will "hard reset" the server to it's initial state. It means, everything you uploaded there will be gone.

2) Servers do not have shared hard disk

If you have multiple dynos4, then each of them will have their own hard disk space. This would make the upload process not to work. The loadbalancer in front of the dynos will give each chunk randomly to one of the servers, which means the files would never be uploaded. Some chunks would end up in Server A on some chunks on Server B, and since the servers do not know anything about each other, the chunks would never get merged into the uploaded "large" file.

Heroku & AWS workaround?

The Heroku docs suggest that you would place the uploaded files to S3, meaning something like this

Let's think this for a moment. User uploads a 1Gb file, using 1Mb chunks size. This would mean that there are 1000 (or 1024) POST requests to Heroku and then 1000 POST requests from Heroku to AWS S3. This would work. Then, there should be a logic to combine all of these 1000 chunk files to one large uploaded file. This would mean that there should be a logic to query S3 form Heroku and ask if these 1000 files have been uploaded or not. This query should (?) happen after each POST request made to S3. I don't know how well this would work. And then there should be a logic to combine all these chunks somehow to a larger file. Maybe there is this kind of function available in S3, maybe not(?). Additional challenge comes from a user of the app wants to use the file "uploaded" to the server. This should be then transferred back to Heroku from S3, and it can take a lot of time, and even need some special considerations if there is flaky internet connection between Heroku and S3. I start to think that for large files, using Heroku with S3 is not an option at all.

For smaller files (max. few Mb) this could work, but would need quite special arrangements: Implementation of communication between Heroku and AWS S3 bucket, merging of uploaded files and accessing them after upload from the app.

Summary

persistent files / apps with running on multiple dynos
Many people use Heroku because it's (free tier is) free. If Heroku would be used in an app requiring file uploads with persistent files, they would need to use S3 (or similar) and pay for it, and create quite complex logic for uploading and using files, which might not work well especially with larger files.

Maybe it would be better to use AWS Lightsail or a DigitalOcean droplet from the beginning, instead, since they have the required hard drive with persistent5 storage.

temporary uploads
Using Heroku alone would be ok when it's okay that all uploaded files are gone when app/dyno restarts, which happens ever 24 hours. This applies to only cases where application runs on only one dyno.


1have not tested to remove it since it was there when I forked the package. 2After upload of any chunck is ready, dash-uploader checks if all the files belonging to this upload has been written to hard disk. 3The reason there are temporary files could be that the order in which chunks are received is not guaranteed. So, to upload 1Gb file you'll need 2Gb of free disk space. The temporary chunks are removed after the files are merged. 4 I don't know if you have to configure yourself to have multiple dynos, or if extra dynos can be created automatically if your server needs them (high load). 5 AWS Lightsail and DigitalOcean droplets probably also reset everything (including uploaded files) on server reset, but these resets should be very rare, and if one is creating a real web application, backups should be taken periodically.

fohrloop avatar Nov 01 '20 15:11 fohrloop