rust-gamedev.github.io
rust-gamedev.github.io copied to clipboard
AI automation for newsletter
Hello!
After some conversation with Ozkriff and looking at how much work goes into editing the newsletter each month, I was wondering if it'd be a good idea to start using some automation tools.
For editorial roles, something like chatgpt/gpt4 can assist a lot. What I had in mind was the bot running on each pull request, and checking the content that was added and feed that to the AI for editing and auditing, and returning either a fixed version or list of things to do to fix them or update them.
For example one of my PR had too much repetition and extra information, and the title wasn't good. In such a case it can do all of that given the newsletter guidelines, or notify me to do them the way required.
Cost wise, since the newsletter is a monthly release, I don't think we can exceed a dollar at the busiest month given how cheap the api is, and since each section is small, the context of gpt3.5 won't be a problem either.
One other thing I remembered is that it can also assist in writing for projects that were announced but had no one to write for them, for example we had Rusty Jam #3. It can also write a section for it without taking time off the editors, and it can just be reviewed for fixes and added.
These are just my suggestions so I'm not sure if it can be appealing to use, I have some experience with OpenAI so I can assist in implementing it.
I would love to see this experimented with.
The AI-assistance track is well worth exploring, but it's worth noting that this type of automation could also work with a much more basic feed aggregator that simply asks projects to list their update feeds (blog rss, mastodon, github releases etc.) and it'd create summaries for projects by simply linking out to their updates for the past month.
@erlend-sh that is actually a really really awesome idea!!!
This week in rust has an interesting bot which might be worth investigating : https://github.com/extrawurst/twir-bot
@ElhamAryanpur if you're still up for implementing something like this, I'd be very up for reviewing it and getting it merged :) I also see some great time saving potential on the editing side and would gladly pay for the API access, since it would be really cheap.
The feature I'd like to see the most would be a short automated summary for content no one has written anything for yet. Maybe it's already enough to feed the raw HTML to GPT and ask it for a summary? I also know that there are services that do this kind of thing for you using GPT like https://notegpt.io/web-summary, idk if they're better than just entering our own prompt though.
@janhohenheim absolutely, since then I've invested a lot of time in my own side of LLM based software, and can say it's even better than ever to do something like this.
We can have four approaches:
-
using a custom model hosted on a VPS or similar. Provides full privacy, can be reused by other Rust based newsletter and publication, even social media moderation as well. periodically or automated, the locally hosted model fetch update changelogs and such, and does all the summaries and reports itself through RAG architecture.
-
using a custom or stock model hosted locally by a maintainer. Same as above, except This is very cost effective and pretty much free, no need for API access anywhere. Models like Mistral 0.2 7B has over 32k context length, 4.5GB in model file size (gguf), and can run on any modern computer. So pretty much anyone can use it. We can even use fine-tuned version such as hermes/dolphin mistral, for better results.
-
fine tune a model by a cloud provider such as OpenAI, claude, google, ... Bit expensive and at mercy of the cloud provider but it can have benefits of the first option.
-
using a stock model by a cloud provider such as OpenAI GPT 4, claude, gemini, ... Cheaper than third option but same risks. Bit of issue with these two options is degradation of the models over time as more gaurdrails are introduced and potentially can sometimes put a dent on your bank if suddenly price changes and such. They can also block your access on a whim if they like.
Personally I think second option would be the best to start with. RAG helps with auto search of changelogs and summary writing. Pull requests too but perhaps a bit difficult automatically locally than github actions 😅.
Let me know which options you'd think is nicer and I can begin.
I also recommend checking out https://spiderwebai.xyz/ by @j-mendez
@ElhamAryanpur great to hear! Since the newsletter has historically struggled with maintainer burden, I am more inclined to option 4. You know this stuff better than me though, so if you think that option 2 would be really really good for us, I'm ready to rent a cheap server on DigitalOcean and give you access. Also, what do you think about the service @erlend-sh mentioned?
I also recommend checking out https://spiderwebai.xyz/ by @j-mendez
Yeah they're using RAG too, I assume langchain by most chances
@ElhamAryanpur great to hear! Since the newsletter has historically struggled with maintainer burden, I am more inclined to option 4. You know this stuff better than me though, so if you think that option 2 would be really really good for us, I'm ready to rent a cheap server on DigitalOcean and give you access. Also, what do you think about the service @erlend-sh mentioned?
That is very true, I have wrote a section about my work there in the past and it shocked me how much work the maintainers did every month...
Yeah we can start locally for development, get some early testing on the newsletter, if the results were great, we can then move to hosting or keep it locally. I just don't wish to burden you for paying the servers or API 😅 trying to get a solution that anyone can use and contribute instead of hurting your wallet, especially at this stage
@ElhamAryanpur alright then! Do you need anything from me to start? How do you want to organize yourself? If you create a repo with a readme on how to run the model, I can ensure it runs on my machine in the background (or on a machine I rented anyway to host a Minecraft server, hehe)
For sure, it'll probably be a repo. I'm not sure much on else yet, will keep you updated here. Thank you!
@ElhamAryanpur great to hear! Since the newsletter has historically struggled with maintainer burden, I am more inclined to option 4. You know this stuff better than me though, so if you think that option 2 would be really really good for us, I'm ready to rent a cheap server on DigitalOcean and give you access. Also, what do you think about the service @erlend-sh mentioned?
That is very true, I have wrote a section about my work there in the past and it shocked me how much work the maintainers did every month...
Yeah we can start locally for development, get some early testing on the newsletter, if the results were great, we can then move to hosting or keep it locally. I just don't wish to burden you for paying the servers or API 😅 trying to get a solution that anyone can use and contribute instead of hurting your wallet, especially at this stage
Hi! If the bandwidth is minimal and simply a page or two ( it would take a lot of request to get to 1$ ), we also do not pad the cost for GPT from OpenAI. The dashboard is very early stage and being actively improved. The service is more flushed from an API perspective atm. I recommend testing a basic prompt on the GPT playground and works it works off a small set of HTML - use the GPT configuration to extract what is needed etc. Lmk if you have any questions. Thanks @erlend-sh!
Hm? We did talk about it in the options listed
Hm? We did talk about it in the options listed
If you create an account I can add a dollar to the account to experiment. The service goal is pretty much putting this project on a server to scale https://github.com/spider-rs/spider.
Hm? We did talk about it in the options listed
If you create an account I can add a dollar to the account to experiment. The service goal is pretty much putting this project on a server to scale https://github.com/spider-rs/spider. We are in the middle of making a dashboard that is like the supabase dashboard to view all of the data from the crawls etc, should be out by next week.
Oh the issue isn't that it can't be done through OpenAI, we're just exploring different options. I'm potentially looking into making it run locally as to cut down the charges from ever occuring. Because it won't just be a page or two of review, it'll also be crawling the changelogs and releases of different project and compile them too, so we're looking at a lot of tokens being used. But yeah the code should be able to be used by any service, including OpenAI in the future. But for now I'm keeping things simple during development.
Hey folks 👋
I noticed last weekend we have not been publishing any newsletters recently, stumbled upon this and the other discussion about maintenance burden, and I wanted to try out an experiment to see if we can improve this. I sort of reached very similar conclusions to the ideas in this thread that more automation is needed to scan stuff, some AI to summarise stuff (or this needs to be done by a human in the meantime), and in general something that can ease the maintenance burden, for example having a basic script that can prepare a draft that needs to be edited, rather than fully created.
Take a look at my experiment here - https://github.com/iolivia/newsletter-bot
Current things it can do:
- Filter the updates by a given time range
- Fetch github releases for engine and library updates - these section are half automated with this approach, the release notes are there but they need to be summarised somehow, and sometimes follow the links to blog posts with release notes
- Fetch github issues for generating request for contributions - this section is 💯 automated with this approach
- Fetch reddit threads for open discussions - this section is 💯 automated with this approach
- Generate basic markdown
There is an example output of the local script
GITHUB_TOKEN=github_pat_<token> cargo run -- 2024-04-01 2024-04-13
Args 2024-04-01 - 2024-04-13
Rust-SDL2/rust-sdl2
bevyengine/bevy
Found release: ✅ v0.13.2 2024-04-04 21:01:55 UTC
Found release: ❌ v0.13.1 2024-03-18 22:38:27 UTC
Found release: ❌ v0.13.0 2024-02-17 19:32:58 UTC
Found release: ❌ v0.12.1 2023-11-30 01:23:10 UTC
Rust-SDL2/rust-sdl2 - 1 Beginner Open Issues - ✅
bevyengine/bevy - 99 Beginner Open Issues - ✅
PistonDevelopers/piston - 0 Beginner Open Issues - ❌
not-fl3/macroquad - 1 Beginner Open Issues - ✅
ggez/ggez - 0 Beginner Open Issues - ❌
nannou-org/nannou - 0 Beginner Open Issues - ❌
jeremyletang/rust-sfml - 1 Beginner Open Issues - ✅
Found top post: Spell Casting system short devlog (written in Rust)
Found top post: This Month in Rust GameDev: Call for Submissions!
Found top post: We're still not game, but progress continues.
Found top post: banging my head against the wall (someone help me think about data structures)
Found top post: Working on a casting system with the first spell (in Rust)
And an example of the markdown file it produces here.
Let me know what you think about this, maybe this is a good starting point 😄
@iolivia wooooah, that's cool! I'll take a closer look once I have time :)
@iolivia amazing work! I can help with the AI part for summary text, will open a PR
@iolivia I checked out the repo, and it looks really nice! Good work! I'll drop you a PR later adding some sources.
One thing of note is that right now, the bot is a bit too good. Many of the news provided are, in my opinion, not significant enough to be included in the newsletter. Removing them by hand is trivial though :) Other than that, we could ignore all posts below a certain amount of upvotes / hearts / retweets etc. and all crate updates that only change the patch version.
Another thing I'm wondering is how to use the bot in practice. Running it at the beginning of the newsletter (the 3rd of the month) seems useless, since it would only aggregate news of the last 3 days. Running it in the middle is a bit arbitrary and will miss quite a few cool updates. Maybe we could add a GitHub Action to run it right at the freeze period to add all news no one has written about yet? If we want this completely automated, we should add the output of the bot to the newsletter only if the newsletter does not already include that content.
Another nice thing would be Discord integration like in TWIR, but that's very much optional.
For the moment, the bot is definitely good enough to be used manually. Again, great work!
The discord integration can be added through webhooks, I have done a few projects with it so I can assist with that. And a solution for the news can be:
- as you said a CI to periodically check for news and filter ones with high upvotes and hearts
- store them in a to be summarized section maybe, or somewhere to gather them, or not if duplicate
- when we are near the newsletter date, we can then check those sections and use AI or human to summarize them. This should solve both issues of not being too early or too late.
I have pushed a PR using LLaMa library for summary, and if we use the model I recommend, dolphin mistral 7B v0.2 GGUF, we should be fine with pretty much any size of gathered release notes as the model supports upto 32k token context length (to compare, ChatGPT at launch had only 2k context length). The model needs ~4.1GB VRAM so pretty much anyone can run it too. Hence gathering the news periodically and summarizing them at the end should be OK.
What do you guys think?
@ElhamAryanpur sounds great! Couldn't we summarize them at the point they get gathered? For example, the bot could aggregate news every 3 days, add them to the GH issue and write a generated summary into the current newsletter markdown file.
@iolivia would you be available for implementing that part? Or would you like some help?
it is possible, just that the model could be too large for github actions to run, and locally it has no batching support yet. So it could take some time to summarize everything. also having it summarized at the end can help the bot have a complete picture of all the development and make a better summary. But we absolutely can do the every 3 days too. can set the bot on a cron job in a server somewhere.
@ElhamAryanpur I've got a fedora server ready to run it :)
hell yeah!
So happy to see all the progress, thanks so much everyone for the awesome contributions already! 🔥
One thing of note is that right now, the bot is a bit too good. Many of the news provided are, in my opinion, not significant enough to be included in the newsletter. Removing them by hand is trivial though :)
Agreed, this was my observation as well! I tried to experiment with removing releases with notes less than x characters, but then you miss all the major releases that have a link to a blog post for notes. Maybe another idea is to create a mini-section for minor releases at the end of each section with mostly a link to the repo and the release version and a one liner, this could help discover repos that are active.
Another thing I'm wondering is how to use the bot in practice.
No strong feelings on this tbh, but trying to keep it simple the options I see:
- someone runs it locally when it's time to generate the newsletter and pushes a PR to the newsletter repo with the md file - this is easy enough to do tbh
- we add a trigger in the CI to run the bot on the 1st of the month that would run it and prepare the draft for the previous month
@iolivia since the newsletter died last time because of maintainer burden I'm weary of adding anything that adds any friction to the process, so I'll be automating as much as possible. I'll try to add a script for all of this after this newsletter so your bot is integrated in the next cycle 🚀
@janhohenheim typesafe 🔥 blazingly fast 🔥 automation to the moon.
If future newsletter entries are going to be AI edited or generated, I'd like to request that no content that I work on for the Rust community be included in the newsletter.
I appreciate the good intentions of everyone involved in this effort.
The point raised by @17cupsofcoffee in https://github.com/rust-gamedev/rust-gamedev.github.io/issues/1417#issuecomment-1660446638 is very salient and captures my sentiment well.
I like the format from This Week in Graphics. The author is funded via Patreon and writes very short bullet point summaries.
I'm interested in helping push for a grant from the Rust Foundation to fund a writer or editor to help with the newsletter as an alternative to involving AI!
I think @LPGhatguy's comment raises a good point that hasn't been made in these threads already - using AI in the production of the newsletter will discourage some people from reading/contributing[^1] (there are a lot of people who aren't massive fans of this tech - in creative spaces, especially!), and I hope this is weighed up versus any potential benefits of using it.
[^1]: If I'm totally honest, it's kind of sapped my motivation to get involved again. The appeal of the newsletter to me is that it's a curated view of all the cool stuff that's going on in the community - the idea of padding it out with LLM-generated text feels like it runs contrary to that, and it bums me out a little.
using AI in the production of the newsletter will discourage some people from reading/contributing
That's fair. It cuts both ways though.
Conversely, I got to a point where I dreaded having to make PRs for the newsletters because I was already writing posts for my project's blog, mastodon, discord etc., which meant my marketing-energy was already spent. As much as I loved the newsletter it also felt like a burden sometimes, since I knew I had all these updates that I should share but didn't, as I simply didn't have Yet Another Post left in me.
Also, due to the immense workload of manual curation, the alternative we've implicitly opted for during the past several months has been no newsletter.
I'm on the record as an AI critic, but that doesn't mean I think it should be unilaterally shunned as a technology, especially not for one of the very few things it's actually good for, namely text summarization/consolidation. I could get behind an objection against proprietary, cloud-based AI, but I really don't have many qualms about the self-hosted variety, in particular when the final publishing is still subject to human review.
The newsletter-bot
does already do a pretty good job without any AI assistance though. If it was possible for people to opt out of the AI treatment for their projects' updates, might that be an acceptable compromise?