The code supports multi-scale training, deriving from MSCNN. You may need to explore the data layer for how to set those configurations. I didn't try it for cascade rcnn.