Non-sequential Pipelines and Tuning

Research output: Chapter in Book/Report/Conference proceedingBook chapterResearchpeer-review

Real-world applications often require complicated pipeline that do not progress sequentially. For example, many experiments have demonstrated that bagging is a powerful method to improve model performance. Bagging can be thought of as a non-sequential pipeline where a learner is replicated, each separate learner is trained and makes predictions, and their results are combined. This is non-sequential as data is not flowing sequentially through the pipeline but is instead passed to all learners (who may then subsample the data) and then recombined, thus creating a pipeline where operations have multiple inputs and outputs. Pipeline operations also have hyperparameters that can be set and tuned to improve model performance. Moreover the choice of operations to include in a pipeline can also be tuned, known as combined algorithm selection and hyperparameter optimization (CASH). This chapter looks at more advanced uses of mlr3pipelines. This is put into practice by demonstrating how to build a bagging and stacking pipeline from scratch, as well as how to access common pipelines that are readily available in mlr3pipelines. The chapter then looks at tuning pipelines and CASH.

Original languageEnglish
Title of host publicationApplied Machine Learning Using mlr3 in R
EditorsBernd Bischl, Raphael Sonabend, Lars Kotthoff, Michel Lang
Number of pages22
PublisherCRC Press
Publication date2024
Pages174-195
Chapter8
ISBN (Print)978-1-032-51567-0, 978-1-032-50754-5
ISBN (Electronic)978-1-003-40284-8
DOIs
Publication statusPublished - 2024

Bibliographical note

Publisher Copyright:
© 2024 selection and editorial matter, Bernd Bischl, Raphael Sonabend, Lars Kotthoff, and Michel Lang. All rights reserved.

ID: 390194958