Martin Fajčík comments

Results 4 comments of


                                            Martin Fajčík

[Feature] Add Support on Log Probability Value from Returned Response of Gemini Models.

> I also think it would be extremely helpful if the API could provide the top-k log probabilities of each predicted token. Yes, this would allow evaluating Gemini with threshold-free...

How to support multi-threaded parallel data preprocessing?

Agree, this would be very useful. Would it be possible to implement sharding for `convert_dataset_json.py`? Simply add extra parameters to specify `# of shards` and `index of shard`. Script could...

How to support multi-threaded parallel data preprocessing?

Isn't enough to just run the script in parallel, and merge the mds shards with this method? https://github.com/mosaicml/llm-foundry/blob/f43d1cfb1ef8f38ca90fee68b0643f45d6d5b2da/llmfoundry/utils/data_prep_utils.py#L29 Currently, I am trying it like this. I have large jsonl file....

How to support multi-threaded parallel data preprocessing?

> Isn't enough to just run the script in parallel, and merge the mds shards with this method? > > https://github.com/mosaicml/llm-foundry/blob/f43d1cfb1ef8f38ca90fee68b0643f45d6d5b2da/llmfoundry/utils/data_prep_utils.py#L29 > > Currently, I am trying it like this....