FiloDB
FiloDB copied to clipboard
IN optimization and controlling task size during multipartition scan
Currently there is a limit on how may partitions are supported during multipartition scan. If we increase the limit then will degrade the performance. Can we start thinking about how far we can go without degrading performance or cause issues? Also can we have a plan to add more tasks to get more cores during multipartition scan. For example => 0-200 or new limit --> default plan New limit 400 --> same plan.. Some how create more tasks but fewer tasks than 5000 400 --> full table scan default behavior
Basically multi partition queries always runs on one Spark partition. WE want to enable bigger multi partition queries which can spread to multiple Spark partitions without invoking filtered full table scans. This will require some intelligent logic.