Saheli Bhattacharjee comments

Results 5 comments of


                                            Saheli Bhattacharjee

[V1][Metrics] Add model_load_time as a log for CUDA devices

Hi @markmc, that makes sense to me. Thanks! Could you please take a look at the current implementation?

[Small LLM] Max tokens fixed at 128?

Adding to this, here's a snapshot of `mlperf_log_accuracy.json` which we obtained by running the reference implementation - ``` { "seq_id" : 0, "qsl_idx" : 2844, "data" : "33F80000540200000BB000004A2700003D1F0000C2020000582B0000D8E3000067EE0000B0010000BA23000085010000A52E0000540C0000170100004A0300003B01000017010000D00C00000D0000003D1F00000B000000DC000000090300000B000000C20200006F070000DC00000016000000D76A00007101000033F8000054020000A4250000DC0000009B060000710C0000A4010000D00C00000D000000030500003E020000DB2100002B02000033F80000F901000001EE00003911000030010000DC000000A70200001000000071010000AB1B00001100000015E800000D0000003D1F0000C202000003040000B71E000037010000080100009B0F0000831B00001354000006040000BC4B00000D00000003050000B2020000EA05000071010000D8E3000067EE00000B000000790300000F020000DC000000210500002F01000030010000AA230000042400000B000000C00200003B0100001701000086E900006F010000712800002B0200001B1700009B1A0000A90400000D0000003D1F0000C2020000A56900009A03000017010000F00D00000B000000B29300002D030000C96E00004301000037B400000D0000003D1F00003E02000008010000181700003001000033F8000054020000A4250000DC000000CC0300000C090000E36C0000500800001E060000D00C00000D00000003050000B2020000B1050000", "token_count" :...

Saheli Bhattacharjee

[V1][Metrics] Add model_load_time as a log for CUDA devices

[Small LLM] Max tokens fixed at 128?

[Small LLM] Max tokens fixed at 128?

[Core] Add Additional Metrics to vLLM Server

[Core] Add Additional Metrics to vLLM Server