Abstract: With the widespread adoption of large-scale deep learning models in cloud computing environments, model serving systems are facing increasingly higher demands in terms of throughput and ...