LiveActionMap icon indicating copy to clipboard operation
LiveActionMap copied to clipboard

Persistently store scraped tweets

Open laurin opened this issue 2 years ago • 14 comments

As discussed in #16, the current storage of scraped tweets is not optimal, because the newly scraped tweets will just be appended to the existing tweets.txt-file, creating a lot of duplicates. Integrating a database is probably not necessary at this point, we could store the scraped tweets with their ID in a json-file and only add new ones in the run of the application.

laurin avatar Feb 27 '22 17:02 laurin

We should also store the time the tweet was created and discard tweets after a certain time or allow the user to select a time-range. The latter would probably require the map to be generated client-side.

laurin avatar Feb 27 '22 17:02 laurin

I agree a json-file is probably the best option. I don't think we should generate things client side, especially because that might add unnecessary lag, especially in places where there might be very slow internet because of the current circumstances. I want to serve a static html to keep the load times as low as possible. Lets just keep set discard tweet time as a parameter server side.

kinshukdua avatar Feb 27 '22 18:02 kinshukdua

We can consider SQLite here too, since it's simple and file-based. It sounds like we're performing some conditional manipulation, and this will help us cut down on time complexity.

Krishna-Sivakumar avatar Feb 27 '22 19:02 Krishna-Sivakumar

@DomiiBunn mentioned firebase, would work here.

Krishna-Sivakumar avatar Feb 27 '22 19:02 Krishna-Sivakumar

@DomiiBunn mentioned firebase, which would work here.

It depends on the complexity you'd look for. Firebase is a nice balance between file storage(JSON files, SQLite, etc) and standalone databases as it's almost as flexible as and handles security, hosting, high availability and at the usage, we'd be expecting it should be fully free. As long as DB reads are cached that is.

DomiiBunn avatar Feb 27 '22 20:02 DomiiBunn

The reason I'm a little hesitant about firebase is that it adds another steps for developed looking to reproduce the repo and contribute. The simpler the project, the easier it is to contribute (as long as it doesn't impact performance or features).

kinshukdua avatar Feb 28 '22 05:02 kinshukdua

Use a config file and specify

useDatabaseCache: false

That way for a larger deployment it's worth caching and for personal deployment it's still working fine without added complexity

DomiiBunn avatar Feb 28 '22 10:02 DomiiBunn

Or using redis but idk how painful it is to implement with python

And i think it would be a bit of an over kill.

DomiiBunn avatar Feb 28 '22 11:02 DomiiBunn

I am working on a fix for duplicate tweets.

sahal-mulki avatar Feb 28 '22 15:02 sahal-mulki

Let's just go with a json file.

Krishna-Sivakumar avatar Mar 01 '22 08:03 Krishna-Sivakumar

Sounds good to me

DomiiBunn avatar Mar 01 '22 09:03 DomiiBunn

Nvm, I failed miserably at it.

sahal-mulki avatar Mar 01 '22 13:03 sahal-mulki

I'd love to help but python ain't my coup of tea

DomiiBunn avatar Mar 01 '22 14:03 DomiiBunn

Sure-a-mundo

sahal-mulki avatar Mar 02 '22 14:03 sahal-mulki