Luke Friedrichs issues

Results 10 issues of


                                            Luke Friedrichs

test_example_concept_learning_neural_evaluation.py currently removed

Caused assertion error: ``` =========================== short test summary info ============================ FAILED tests/test_example_concept_learning_neural_evaluation.py::TestConceptLearningCV::test_cv - AssertionError: Search tree is empty. Ensure that there is at least one owl:Class orowl:ObjectProperty definitions ============= 1...

Tokenizer used for context length checking

https://github.com/dice-group/dice-embeddings/blame/674e9f5e521e304691ef063f9f79b23e0a5f8ef2/retrieval_aug_predictors/models/RALP.py#L59C2-L59C64 Why do we use gpt-3.5-turbo tokenizer here? Is this the one used by the current LM we are using? Also, shouldnt this be variable depending on the model used?...

question

Main idea of BET (Byte Entity Transformer)

WIP but works for negative Sampling and crazy slow inference (see below): https://github.com/dice-group/dice-embeddings/tree/BET Instead of learning one embedding per entity, BET encodes the raw bytes of the entity and relation...

enhancement

Benchmark CoKE model and add results to README

We should run benchmarks for the CoKE model and add its metrics (MRR, Hits@1/3/10, datasets, training setup) to the benchmarking tables in the README.

enhancement

wandb for tracking of training runs

we could add wandb for tracking of loss curves, hyperparameters, eval results,... https://github.com/wandb/wandb

enhancement

Update arguments parser to support all arugments given to the PL trainer in dice_trainer.py

Otherwise you can not use the deepspeed trainer for instance (using the --strategy argument). Also the deepspeed package is not installed by default rn, maybe we want to add it...

Quadratic memory usage increase when increasing batch size

For different batch_sizes i observed a quadratic memory increase, ie: ``` 256 -> CUDA out of memory. Tried to allocate 37.61 GiB and 512 -> ... 150.42 GiB and 1024...

ignoring *.cuh prevents multi_tensor_apply.cuh from being pushed

This: https://github.com/deepspeedai/DeepSpeed/blob/53e91a098d0a0666ac8cb8025a5b36e5af172d08/.gitignore#L61C1-L61C6 ignores the [multi_tensor_apply.cuh](https://github.com/deepspeedai/DeepSpeed/blob/master/csrc/adam/multi_tensor_apply.cuh) file which prevents fusedAdam from working when this file is not being pushed to remote, when working with a cloned copy of Deepspeed.

[BUG] Can not use deepspeed.layer.moe

**Describe the bug** Importing the `deepspeed.layer.moe` throws raises this ValueError: ` ValueError: Target parameter "qkv_w" not found in this layer. Valid targets are []` from: https://github.com/deepspeedai/DeepSpeed/blob/e993fea38efe654592b956d1ab52e340bfbf9714/deepspeed/inference/v2/model_implementations/layer_container_base.py#L97-L99 and this ValueError: `...

bug

training

if no expert found in parameter that have expert in name the loop should continue

I have implemented some custom logic in the deeepspeed_moe classes and having "expert" in any parameter name breaks the saving function for checkpoints. The warning triggers since the code founds...