Data source: datasheets?
Could we get datasheets of a bunch of popular chips, MCUs, etc, and feed them into training data? Alternatively (or also) perhaps use libraries that the manufacturer provides.
example use case: Q: "why wont my code work [code] on my board" A: "You are trying to use the WIFI, but should be using WIFI0, which is defined in the library for your board" and Q: "why wont my sensor respond" A: "you need to send it the 0xAF command to turn off sleep mode"
I don't see why not, licenses permitting. Would you be interested in working on this?
I don't see why not, licenses permitting. Would you be interested in working on this?
Possibly, what would I need to do, and how?
Possibly, what would I need to do, and how?
The general process for providing datasets is just to create a Jupyter notebook in notebooks/data-augmentation in the repo, which will download the data (you can upload it to Hugging Face or similar if it isn't already easily available). The notebook should convert the data to a simple Q-A format which we need for training, e.g. JSONL where each line has prompt and response, and write it locally. Then you can make a PR with the notebook (but don't include the downloaded data itself)
The general process for providing datasets is just to create a Jupyter notebook in
notebooks/data-augmentationin the repo, which will download the data (you can upload it to Hugging Face or similar if it isn't already easily available). The notebook should convert the data to a simple Q-A format which we need for training, e.g. JSONL where each line has prompt and response, and write it locally. Then you can make a PR with the notebook (but don't include the downloaded data itself)
I haven't used Jupyter a lot (or complicated Python, honestly), but if I am understanding, I think I could give it a try. Looking at other entries in notebooks, what your saying is I need to:
- Use an API to get data from sources (Perhaps use octopart or something? )
- Somehow make them all plaintext (typically would be a PDF)
- Somehow split the data up into questions and answers (Not sure if there is a good way to automate this)
I haven't used Jupyter a lot (or complicated Python, honestly), but if I am understanding, I think I could give it a try. Looking at other entries in notebooks, what your saying is I need to:
- Use an API to get data from sources (Perhaps use octopart or something? )
- Somehow make them all plaintext (typically would be a PDF)
- Somehow split the data up into questions and answers (Not sure if there is a good way to automate this)
Pretty much. See below an example notebook which grabs a bunch of Reddit comments and applies some logic to convert them to Q-A format. Not sure how easy it would be to do reliable conversion to plaintext if you can only get them in PDF format though. I also have no clue how licenses would work with something like Octopart.
https://github.com/LAION-AI/Open-Assistant/blob/main/notebooks/data-augmentation/changemyview-builder/data_processor.ipynb
I haven't used Jupyter a lot (or complicated Python, honestly), but if I am understanding, I think I could give it a try. Looking at other entries in notebooks, what your saying is I need to:
- Use an API to get data from sources (Perhaps use octopart or something? )
- Somehow make them all plaintext (typically would be a PDF)
- Somehow split the data up into questions and answers (Not sure if there is a good way to automate this)
Pretty much. See below an example notebook which grabs a bunch of Reddit comments and applies some logic to convert them to Q-A format. Not sure how easy it would be to do reliable conversion to plaintext if you can only get them in PDF format though. I also have no clue how licenses would work with something like Octopart.
https://github.com/LAION-AI/Open-Assistant/blob/main/notebooks/data-augmentation/changemyview-builder/data_processor.ipynb
I wonder if I could find ones that have a format like this:
/values for this thing/ Define keyword value; // first one and could restructure it like this: Q: when using this thing, how do you do first one? A: for first one on thing, you can use keyword, which is value and in datasheets: Q: How do I do (description of thing) A: Use (thing being described)
The problem is datasheets and drivers are typically copyrighted so I don't know how we could..
stuff like this is ideally what I was thinking https://github.com/ArduCAM/Energia/blob/master/hardware/tools/msp430/msp430/include/msp430fr6989.h
I have assigned to you @mm12. Thank you!
I have assigned to you @mm12. Thank you!
Do you know if there is anyone who can help me figure out licensing stuff ?
I am not sure @mm12. It depends one where you are located. If you don't feel comfortable with the license, i would recommend you not pursue it. We want to do the right thing :)
Yeah, I don't have the knowledge to do this. If anyone is interested in doing this, feel free