Results 30 issues of QinLuo

### ๐Ÿ› Describe the bug When using GeminiPlugin, I got a RuntimeError: `RuntimeError: value cannot be converted to type float without overflow` the full traceback: ``` Traceback (most recent call...

bug

### Describe the bug After executing run.log({"a": 99.0, "c": 85.0, "custom_step": 1000}, step=None) and subsequently closing it with run.finish(), the process hangs. The following warnings and upload progress messages are...

cli

Now, the `hparams` are displayed at the beginning of the table, and the last column is `hash` , looks like: ![image](https://github.com/aimhubio/aim/assets/1772912/56525273-4d02-4e8a-8484-9b7a8a828d07) One can go deeper onclik `hash` column: ![image](https://github.com/aimhubio/aim/assets/1772912/709f458a-60ef-4d40-a8e4-33a39b100ea8) and...

type / question

## ๐Ÿ“Œ Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...

### Describe the feature The functionality for saving and loading shared models and optimizers is currently not implemented, leading to the raising of a `NotImplementedError`. How can one proceed to...

enhancement

### ๐Ÿ› Describe the bug When training the Mixture of Experts (MoE) model with code snippets in the application/ColossalMoE, I encountered Out of Memory (OOM) issues at the beginning. ```...

bug

### ๐Ÿ› Describe the bug With the main branch `applications/ColossalMoE`, I got such error: ``` grad = grad.to(master_moe_param.dtype).to(master_moe_param.device) AttributeError: 'NoneType' object has no attribute 'to' ``` start script: ``` NUM_GPU=2...

bug

### Describe the feature A recent paper titled "GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection" (https://arxiv.org/pdf/2403.03507.pdf) demonstrates a remarkable memory-efficient approach during the training of large language models (LLMs)....

enhancement