BMInf
BMInf copied to clipboard
Efficient Inference for Big Models
我设置的memory limit为6
ERROR in app: Exception on /api/fillblank [POST] Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2070, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1515, in full_dispatch_request rv = self.handle_user_exception(e)...
I was reading the documents and the technical paper, seems like the experiment are done in single Node. Does BMInf support to multiple nodes inference deployment for large model like...
File "/home/wenxuan/lihaijie_files/cpm-live/examples/tune_cpm_ant.py", line 56, in delta_model.freeze_module(exclude=["deltas"], set_state_dict=True) File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 274, in freeze_module self._freeze_module_recursive(module, exclude, "") # modify the active state dict that still need grad File "/home/wenxuan/miniconda3/envs/lhj/lib/python3.9/site-packages/opendelta/basemodel.py", line 316,...
**Is your feature request related to a problem? Please describe.** There are other speedup methods for transformers like [FasterTransformer](https://github.com/NVIDIA/FasterTransformer). **Describe the solution you'd like** Can you describe how your method...