GenAIExamples
                                
                                 GenAIExamples copied to clipboard
                                
                                    GenAIExamples copied to clipboard
                            
                            
                            
                        Minor fixes for CodeGen Xeon and Gaudi Kubernetes codegen.yaml and doc updates
Description
This PR has a few updates based on issues that I ran into when deploying the CodeGen example on a cluster for xeon and Gaudi. The following issues are addressed in the PR:
- I added a note about potentially using a persistent volume claim instead of having to create the /mnt/opea-modelsdirectory on the nodes
- Deploying the codegen.yamlfiles gave an error like:
 This error is because the ConfigMap in the yaml has a few env vars that are just empty (nil). Changing these to have empty quoteserror: error validating "codegen.yaml": error validating data: [unknown object type "nil" in ConfigMap.data.http_proxy, unknown object type "nil" in ConfigMap.data.https_proxy, unknown object type "nil" in ConfigMap.data.no_proxy]; if you choose to ignore these errors, turn validation off with --validate=false""fixes the issue. [EDIT: this was resolved in PR 630]
- I added a note about it taking a couple of minutes for the service to start and how to check the logs, because I ran into an issue where the curlcommand failed like "curl: (18) transfer closed with outstanding read data remaining" and it was just because the service wasn't ready yet. Also, knowing how to check the logs is useful for watching the status and figuring out if thecurlcommand is failing because of an error.
- When running on Gaudi wasn't working for me ("RuntimeError: synStatus=26 [Generic failure] Device acquire failed.") until I added the hugepages-2Mi/memoryto the resource limits. The habana documentation for Kubernetes shows it usinghugepages-2Miandmemoryin the resources, so that seems to be the recommended config.
Issues
N/A
Type of change
List the type of change like below. Please delete options that are not relevant.
- [x] Bug fix (non-breaking change which fixes an issue)
- [x] Others (enhancement, documentation, validation, etc.)
Dependencies
N/A
Tests
Manually tested the changes on a Kubernetes cluster with Xeon and Gaudi 2 nodes.