[Docs]: General LLM Guide

Open TKTSWalker opened this issue 1 year ago • 1 comments

📘 Current State of Documentation

As mentioned in a previous journal, I believe it could be beneficial to have a guide for those wishing to start LLM development! This is a general list of journals I could create; I'm willing to create all of these from scratch and it would take a month and a half to get most done at the very most (not counting the Advanced Usages which might take a bit longer because of the data needed for two of them)!

These do also use a few concepts/prototypes I made for a program I made called Project Replicant (such as Engels or the understanding 3D using a CAD like database) so I hope that's alright!

I also do want to know if there's a specific API you guys wish for me to use; I do want to use something like huggingface (which offers a free tier)! I would suggest this as bouncing around different APIs early on might make understanding exactly what's being done as well as why harder!

📖 Suggested Improvement

My idea for a guide goes as follows

LLM Fundamentals and Advanced AgentOps Implementation

Basic Usages

Text Generation
- Finishing a sentence or creating a paragraph based on a prompt
- Generating a short story based on a Nier Style Sentence
- Finishing the second half of a sentence based on the emotion a user wants to convey
Classifying Data Using an LLM
- Based on preset categories
- Summarizing sentences into positive, neutral, and negative sentiments
- Inferring what category an item may be based on its details
Summarizing Information
- Basic summarization for now (at advanced levels Engels)
- Summarizing general articles about multiple topics
- Summarizing conversation and keeping the most important details (People, places, and things + names and dates)
Adding Context to History
- Adding context to our history based on a prompt (Advanced levels custom history)
- Having an AI finish a task and adding to history before asking a question that takes the previous context into account
Using a Local Search Engine System
- Giving context and taking input (Challenge for basic level)

Intermediate Usages

Developing a Chatbot
- First with single user, then with multi-user, then with multi-chatbot and multi-user
- Taking one user input (standard)
- Formatting the inputs to give context to who a user is (with an introduction prompt)
- Simulating multiple chatbots conversing at the same time to different users before going back to talk to one
Fine-Tuning Chatbots
- Fine-tuning chatbot for better answers using a simple CSV sheet
- Changing the tone an LLM responds in with CSV data
- Giving a chatbot more context through a CSV sheet with Q/A
Dataset Creation
- Yes/No-based, then text-to-text-based, then complete generation from scratch
- “Is this a _ ?”
- Turning a description into a list of questions and answers
- Generating complete text from scratch (A few ideas here)
Grouping Outputs
- Grouping outputs into premade categories (generating context then packaging it)
- Taking the output from generated text and using tools to help sort it
- Sorting information from a conversation into a specially made database (Challenge)

Advanced Usages

RAG-Based Information Searching
- API-based, Google Search-based, Multi-database
- Reflecting on the date/time with a free API
- Using the Google search snippets to get information
- Using multiple CSV files as context for an LLM
Email-Based Assistant
- Using LLM to create emails
- Finding a certain type of data (CTO) and generating custom messages for each
- Determining safety risks based on LLM + API search
Stylized Text Generation
- Documentation, DnD campaign, etc.
Formatting Conversations
- Formatting conversations into a specific JSON format for recalling later
- Taking notes and formatting them into a more professional state
Research Studies
- Converting chat history to shortened text and using as context for longer chatbot context with less worry about tokens
- Engels, an AI summary language I developed (Showing how to create a dataset and implement it)
LLM-Ran Town
- Creating an LLM-ran town, visualizing it in Unity, and using it to train around different goals
  - Goals such as trying to get LLMs to speak to each other as often as possible, remembering context from long ago, or keeping conversation minimal for a DB
Interacting with 3D Space
- Having an LLM interact with 3D space based on semantic + CAD-like data and Unity AR/VR
- (This data is already being created by a friend and I using a 3D rooms generator before being moved to 3D)
Teaching LLM Rulesets
- Rulesets for games and long-term rulings (such as Chess and Checkers, also stopping the AI from sharing its context through anti-examples)
Custom AgentOps Implementations
- Creating custom implementations for AgentOps (I have been testing this out in relation to Gemini; I believe my mistake wasn’t in the code itself but rather mixing up the output delta block with another term. Still, to be safe, I plan on restarting)

🔗 Affected Documentation Pages

No response

🔍 Additional Context

No response

🤝 Contribution

[x] Yes, I'd be happy to submit a pull request with these changes.
[ ] I need some guidance on how to contribute.
[ ] I'd prefer the Agentops team to handle this update.

Nov 23 '24 20:11 TKTSWalker

Adding a general section to the start talking more about LLMs logic and AgentOps; was talking to a few people about the tool and gauged a few problems they had while trying to learn how to use it.

Should be sharing a draft markdown for the first and second page in 2 days at most!

Dec 01 '24 14:12 TKTSWalker