web-tools document process for independent deployment

document process for independent deployment

Open rahulbot opened this issue 4 years ago • 22 comments

We don't have good docs for how someone should deploy their own instance of our front-end apps. I started a deploying.md document here to capture some of what sprung to mind: https://github.com/mitmedialab/MediaCloud-Web-Tools/blob/main/doc/deploying.md

@esirK - this will be relevant for you and your team!

Aug 03 '20 16:08 rahulbot

We don't have good docs for how someone should deploy their own instance of our front-end apps. I started a deploying.md document here to capture some of what sprung to mind: https://github.com/mitmedialab/MediaCloud-Web-Tools/blob/main/doc/deploying.md

@esirK - this will be relevant for you and your team!

Thanks for this @rahulbot I will follow the docs and will keep you updated in case I face any troubles.

Aug 03 '20 16:08 esirK

This helped me set up the Explorer and Source Manager. However, the functionality isn’t working and after some debugging, I think it's due to some missing tags. I also found this https://github.com/mitmedialab/MediaCloud-Web-Tools/blob/main/server/views/sources/init.py#L13 where we have some predefined tags_id which aren't in the database. My question therefore is where do these tag ids come from and I'm I supposed to manually create them?

Aug 05 '20 04:08 esirK

Ugh - yeah, those hard-coded tags are all over our system :-( I'll open an issue to think about how to generalize that.

Aug 05 '20 13:08 rahulbot

I was facing an issue when I tried deploying the app in dev mode on EC2 where the UI apps were not being loaded. I was only being able to see the following page Screenshot 2020-08-11 at 10 56 56 I traced the issue here https://github.com/mitmedialab/MediaCloud-Web-Tools/blob/main/server/init.py#L120 the manifest.json file had hardcoded values {"assets":{"app_css.css":"app_css.16dcf6def49cd99c8346.css","app_css.js":"app_css.16dcf6def49cd99c8346.js","app_js.css":"app_js.16dcf6def49cd99c8346.css","app_js.js":"app_js.16dcf6def49cd99c8346.js","common_css.css":"common_css.16dcf6def49cd99c8346.css","common_css.js":"common_css.16dcf6def49cd99c8346.js"},"publicPath":"http://localhost:2992/"} I therefore had to update the publicPath to the hostname of my ec2 instance. Is this something we should also add in the setup documentation?

Aug 11 '20 09:08 esirK

@esirK -- wow, you are powering through this, and we'll weave this feedback into making making Media Cloud easier to deploy for others.

I'm a little surprised that you needed to change publicPath though. When running in production, the flask should serve up the static compiled assets. These are created when running something like npm run topics-release (the app changes based on which one you're building). Are you running the flask server in dev mode on your ec2 instance?

Aug 11 '20 18:08 dsjen

hey there @esirk What is your app.config SERVER_MODE ?

Aug 11 '20 18:08 cindyloo

Yes @dsjen, @cindyloo right now I'm running in dev mode. I'll try running in production mode and give feedback. Also, I was able to get the Explorer app to work by making the https://github.com/mitmedialab/MediaCloud-Web-Tools/blob/main/server/views/explorer/__init__.py#L77 only_queries_reddit to always return True which I would guess makes queries only to Reddit as a Media Source; So my question is what services should I have running in order to be able to run the Explorer on other Media Source?

Aug 11 '20 18:08 esirK

I believe Explorer is mostly powered to the backend data sources via API and the reddit functionality is pretty limited. I don't believe it's integrated with other sources (e.g. Twitter). Do you have the media cloud backend up and running?

Aug 11 '20 18:08 dsjen

for running in dev mode, the way it works best (and for the moment until you get an update), is to create an empty manifest.json file in your MC/build directory that contains {"assets": {}, "publicPath": "./build"}. This is because we use npm to generate the json file which flask_webpack uses to know where to live-compile the files into.

If you do that, all of these related issues running in dev should go away

Aug 11 '20 18:08 cindyloo

@dsjen yes I do and I also added some collections manually but I'm not sure what I should expect e.g I have the Solr server running here http://ec2-52-18-167-8.eu-west-1.compute.amazonaws.com:8983/ but executing any query returns no data.

Aug 11 '20 19:08 esirK

That's great progress! I'm working on that tag issue I mentioned earlier right now.

That reddit thing is a hack. You want it to return False so that it always queries against your Media Cloud install.

Can you see if Solr has any data in it? Overall the system pulls runs jobs that check for sources that have RSS feeds associated, ingests and process those and stores them in Postgres. Then it grabs stories that aren't already in Solr and imports them from Postgres into Solr. Perhaps one of the links on that back-end chain isn't working?

Aug 11 '20 19:08 rahulbot

@rahulbot Sure I'll revert the hack in order to use the Media Cloud install. Right now Solr doesn't have any data. The import stories feedly service shows that it's finding some stories e.g MediaWords.ImportStories.Feedly: _get_new_stories_from_feedly chunk: 34999 total stories found but the apps_import-solr-data_1 service shows 0 stories

MediaWords.Solr.Dump: added 0 topic stories to the import

MediaWords.Solr.Dump: too few stories (0/1000). sleeping 60 seconds ..

Aug 12 '20 07:08 esirK

If I remember correctly, we use feedly to back-fill older content where possible. The main app that regularly fetches RSS feeds from media sources is a different one. Perhaps take that question back to the back-end repo so they can help dig into why you're not getting stories into Solr yet? I'd imagine the process would be to add some sources, add some feeds, make sure the feed scraper is running, and then stories show up in the DB.

Aug 12 '20 15:08 rahulbot

Sure will do that.

Aug 12 '20 15:08 esirK

Hello. I was able to do the front-end apps deployment successfully but seems like I need to login to each app independently. I was thinking that setting the COOKIE_DOMAIN config to the domain I'm using should resolve this but it didn't. So my question is; How do I fix this such that I only need to log into just one app?

Sep 03 '20 06:09 esirK

I'm so glad that you got the apps deployed! I think COOKIE_DOMAIN is the correct config to set, so I'm not exactly sure what might be amiss. Our domain is .mediacloud.org -- was a little surprised to see the . at the beginning, so maybe that's key?

This might be a little bit of trial and error--sorry!

Sep 03 '20 15:09 dsjen

The first piece is that the session is stored in the external redis cache so it can be used across all the domains (via the SESSION_REDIS_URL environment variable). The second, as you point out, is that the cookie domain needs to be a valid one so that the cookie which gets set works across any subdomain (via the COOKIE_DOMAIN env var mentioned). I found that prepending with the period effectively wildcarded it for all subdomains (but I'm not a cookie expert).

Sep 03 '20 15:09 rahulbot

Hello again. I have one more query concerning the WORD_EMBEDDINGS_SERVER_URL defined here https://github.com/mediacloud/web-tools/blob/main/config/app.config.template#L35 which word embedding server are you using or do we have a document concerning this?

Nov 09 '20 07:11 esirK

@esirK -- you'll need to spin up an instance of https://github.com/mediacloud/word-embeddings-server. We can help if you have trouble with the set up. Good luck!

Nov 09 '20 19:11 dsjen

@esirK -- you'll need to spin up an instance of https://github.com/mediacloud/word-embeddings-server. We can help if you have trouble with the set up. Good luck!

Thank you for this @dsjen I was able to set this up but I had to change the versions of numpy and scipy to the following

numpy==1.17.0
scipy==0.18.1

without that, I was getting the following error sklearn import error - ImportError: cannot import name 'comb' which was originating from from sklearn.decomposition import PCA

Nov 10 '20 03:11 esirK

Oh, good to know! Please consider making a PR to update that for us! 🙏

Nov 10 '20 13:11 dsjen

Oh, good to know! Please consider making a PR to update that for us! 🙏

Sure I'll do this.

Nov 11 '20 05:11 esirK

web-tools web-tools copied to clipboard

document process for independent deployment

web-tools
web-tools copied to clipboard