evadb
evadb copied to clipboard
User Experience Improvement Summary
Search before asking
- [X] I have searched the EvaDB issues and found no similar feature requests.
Description
This is a issue that summarizes the user's feedback on their experience with EvaDB:
SQL Statement related:
- [ ] USE statement in EvaQL does not allow parenthesis, and it only supports one native statement.
- [ ] LOAD DOCUMENTS and CHUNK SIZE is confusing.
- [ ] GROUP BY only works on video and document tables. GROUP BY does not work on traditional columns.
- [ ] More documentation on the syntax of creating new tables.
- [ ] #1320
- [ ]
SELECT * FROM table
does not work for third-party data sources. It is not clear we have to usedatabase.table
as table source.
Data related:
- [ ] Convert numbers from string (e.g., with comma, scientific numbers) into integer.
- [ ] Loading different text file (.pdf and .txt) in the same table is problematic.
- [ ] Version incompatibility (table and index are created by an earlier version of EvaDB)
- [ ] Data types are not supported in EvaDB when reading from postgres, e.g., array types or uuid.
- [ ] Most DB cursors sanitize user-provided data in queries automatically, but EvaDB did not seem to have this functionality.
- [ ] Escape all the single apostrophes when insert data into database.
- [ ] No string concatenation method
- [ ] Can not directly insert dataframes
- [ ] On a Windows machine, storing file paths as strings in MySQL caused problems with misidentifying backslashes for escape sequences.
- [ ] New data sources:
- [ ] Google / Bing as a data source for web scraping (e.g., Serper)
- [ ] Google map
- [ ] Github issues
- [ ] Reddit PRAW
- [ ] Wiki
User-defined function related:
- [ ] Sklearn now only supports for linear regression model, more machine learning models can be introduced and supported
- [ ] Function only work on table elements. Need to create a table with one tuple.
- [ ] For aggregation user defined functions, we need a way to pass the complete table into the function instead of row by row or batch by batch.
- [ ] Type definition in Custom AI function's forward decorator is confusing.
- [ ] Input/Output Format of a function (e.g., AWS Rekognition service) can be bytes, while EvaDB's forward function requires input in the form of a numpy array
- [ ] Challenge to figure out what is the best algorithm to predict (e.g,, layoff)
LLM related:
- [ ] Deal with various constraints of the OpenAI API, such as token restrictions, rate limits, and other limitations.
- [ ] ChatGPT deviate from the 'yes' or 'no' responses quite often.
- [ ] Convert the ChatGPT response, which in most cases is just some text
- [ ] Unable to find the EvaDB implementation of exactly how the OpenAI API was called, I ran into unsatisfactory results with certain prompts and was unable to debug them. Certain prompts resulted in a response indicating that no text had been submitted. I assumed this may have been due to certain characters in the string, but without the ability to inspect the source code of how the API request is made, I ended up resorting to simply trying various prompts until I had satisfactory results.
Index related:
- [ ] Similarity function is not symmetric.
- [ ] Create index does not work on empty table and third party table.
- [ ] Vector databases like Milvus and Pinecone have unavoidable setup efforts by users.
Optimization related:
- [ ] Web scrape and ChatGPT are expensive and bottleneck of the execution
- [ ] There is optimization opportunity for caching, which is not flushed out.
Installation related:
- [ ] Documentation of installation of EvaDB doesn't work for Windows. (i.e., Copying the notebook code to run on windows)
- [ ] Installation for PyTesseract isn't as simple
- [ ] Docker file for EvaDB has been updated for a while.
Use case
No response
Are you willing to submit a PR?
- [ ] Yes I'd like to help by submitting a PR!