Andy

Results 13 issues of Andy

### What happened? Seeing this [error](https://ci-beam.apache.org/job/beam_PostCommit_Python38/2960/testReport/apache_beam.runners.dataflow.dataflow_exercise_metrics_pipeline_test/ExerciseMetricsPipelineTest/test_metrics_it/) in `test_metrics_it`. There were no changes at the time of first failure (run 2960). ``` self = @pytest.mark.it_postcommit def test_metrics_it(self): result = self.run_pipeline() errors...

python
core
P1
bug

### What happened? Seeing this error in Python [PostCommits](https://ci-beam.apache.org/job/beam_PostCommit_Python37_PR/411/console) ``` 21:36:07 Exception in thread read_grpc_client_inputs: 21:36:07 Traceback (most recent call last): 21:36:07 File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner 21:36:07 self.run()...

bug
P2
test-failures
awaiting triage

Adding `PytorchBatchConverter` as [discussed](https://github.com/apache/beam/issues/21440#issuecomment-1239777760) in https://github.com/apache/beam/issues/21440 as an initial step to integrate Batched DoFns into RunInference. ------------------------ Thank you for your contribution! Follow this checklist to help us incorporate your...

python

# What does this PR do? Mimics https://github.com/huggingface/transformers/pull/9006, but for Flax. We want to match how PyTorch's logic accounts for `group_size` and `num_beam_groups` [here](https://github.com/huggingface/transformers/blob/v4.30.2/src/transformers/generation/beam_search.py#L175) and [here](https://github.com/huggingface/transformers/blob/v4.30.2/src/transformers/generation/beam_search.py#L249C1-L281C26) Fixes # (issue) ##...

For llama benchmarks, the submission checker uses tokens per second for Offline, but samples per second for Server. https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py#L1385 However, the summary.csv still [use](https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py#L2543-L2544) samples/second as the header to report...

Addresses the feature request here https://github.com/mlcommons/inference/issues/1691

It is sometimes useful to skip a scenario when validating the correctness of the submission package when using [submission_checker.py](https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py). For example, if we only have Offline results, but not any...

- General cleanup of instructions - Generalizing the script to convert llama checkpoints s.t. it supports custom GCS buckets for the maxtext checkpoints. - Adding quantization instructions

Currently the model conversion script will [create a bucket](https://github.com/google/JetStream/blob/main/jetstream/tools/maxtext/model_ckpt_conversion.sh#L36) `export MODEL_BUCKET=gs://${USER}-maxtext`. However, it may be the case that the `gs://${USER}-maxtext` path already exists, which I imagine would break the script....