petals
petals copied to clipboard
Allow filtering by max sequence length
Problem: if some (but not all) servers support longer sequence length, inferencing with that sequence length would be very inefficient because the client will constantly bump into short-length servers.
Suggested solution: if we ask servers to report max sequence length to the DHT, a client will be able to filter by sequence length as they read DHT entries.