evadb User Experience Improvement Summary

User Experience Improvement Summary

Open xzdandy opened this issue 1 year ago • 0 comments

This is a issue that summarizes the user's feedback on their experience with EvaDB:

SQL Statement related:

[ ] USE statement in EvaQL does not allow parenthesis, and it only supports one native statement.
[ ] LOAD DOCUMENTS and CHUNK SIZE is confusing.
[ ] GROUP BY only works on video and document tables. GROUP BY does not work on traditional columns.
[ ] More documentation on the syntax of creating new tables.
[ ] #1320
[ ] SELECT * FROM table does not work for third-party data sources. It is not clear we have to use database.table as table source.

Data related:

[ ] Convert numbers from string (e.g., with comma, scientific numbers) into integer.
[ ] Loading different text file (.pdf and .txt) in the same table is problematic.
[ ] Version incompatibility (table and index are created by an earlier version of EvaDB)
[ ] Data types are not supported in EvaDB when reading from postgres, e.g., array types or uuid.
[ ] Most DB cursors sanitize user-provided data in queries automatically, but EvaDB did not seem to have this functionality.
[ ] Escape all the single apostrophes when insert data into database.
[ ] No string concatenation method
[ ] Can not directly insert dataframes
[ ] On a Windows machine, storing file paths as strings in MySQL caused problems with misidentifying backslashes for escape sequences.
[ ] New data sources:
- [ ] Google / Bing as a data source for web scraping (e.g., Serper)
- [ ] Google map
- [ ] Github issues
- [ ] Reddit PRAW
- [ ] Wiki

User-defined function related:

[ ] Sklearn now only supports for linear regression model, more machine learning models can be introduced and supported
[ ] Function only work on table elements. Need to create a table with one tuple.
[ ] For aggregation user defined functions, we need a way to pass the complete table into the function instead of row by row or batch by batch.
[ ] Type definition in Custom AI function's forward decorator is confusing.
- [ ] Input/Output Format of a function (e.g., AWS Rekognition service) can be bytes, while EvaDB's forward function requires input in the form of a numpy array
[ ] Challenge to figure out what is the best algorithm to predict (e.g,, layoff)

LLM related:

[ ] Deal with various constraints of the OpenAI API, such as token restrictions, rate limits, and other limitations.
[ ] ChatGPT deviate from the 'yes' or 'no' responses quite often.
[ ] Convert the ChatGPT response, which in most cases is just some text
[ ] Unable to find the EvaDB implementation of exactly how the OpenAI API was called, I ran into unsatisfactory results with certain prompts and was unable to debug them. Certain prompts resulted in a response indicating that no text had been submitted. I assumed this may have been due to certain characters in the string, but without the ability to inspect the source code of how the API request is made, I ended up resorting to simply trying various prompts until I had satisfactory results.

Index related:

[ ] Similarity function is not symmetric.
[ ] Create index does not work on empty table and third party table.
[ ] Vector databases like Milvus and Pinecone have unavoidable setup efforts by users.

Optimization related:

Installation related:

[ ] Documentation of installation of EvaDB doesn't work for Windows. (i.e., Copying the notebook code to run on windows)
[ ] Installation for PyTesseract isn't as simple
[ ] Docker file for EvaDB has been updated for a while.

No response

Oct 24 '23 16:10 xzdandy