petals Allow filtering by max sequence length

Allow filtering by max sequence length

Open justheuristic opened this issue 2 years ago • 3 comments

Problem: if some (but not all) servers support longer sequence length, inferencing with that sequence length would be very inefficient because the client will constantly bump into short-length servers.

Suggested solution: if we ask servers to report max sequence length to the DHT, a client will be able to filter by sequence length as they read DHT entries.

Jul 20 '23 21:07 justheuristic

Could max seq len for forward/backward also be reported? And, as issue's solution is to filter by sequence length, could the absolute max seq length be tied to the hosts set max so 65b could be trained for and inferred above 2048 please? If a server chooses to host over that of course. The petals team is awesome btw :)

Jul 21 '23 11:07 ghost

@Jeduh, I agree that it would make sense. Just FYI, right now the limits are 8192 for Llama 2 (70B, 70B-Chat), 2048 for all other models.

Jul 22 '23 08:07 borzunov

@justheuristic What do these numbers 256 * 1024 * 1024 represent?

Jul 23 '23 11:07 ghost

petals petals copied to clipboard

Allow filtering by max sequence length

petals
petals copied to clipboard