HYBRID MODELS OF THEORY AND DATA SCIENCE

Tamires Soares
2 min readJun 10, 2021

One way to combine the strengths of scientific knowledge and data science is by creating hybrid combinations of theory-based and data science models, where some aspectsof the problem are handled by theory-based components while the remaining ones are modeled using data science components. There are several ways of fusing theory-based and data science models to create hybrid theory-guided data science models. One way is to build a two-component model where the outputs of the theory-based component are used as inputs in the data science component.

Theory-based model outputs can also be used to supervise the training of data science models, by providing physically consistent estimates of the target variable for every training instance.

An alternate way of creating a hybrid theory-guided data science model is to use data science methods to predict intermediate quantities in theory-based models that are currently being missed or inaccurately estimated. By feeding data science outputs into theory-based models, such a hybrid model can not only show better predictive performance but also amend the deficiencies in existing theory-based models. Further, the outputs of theory-based models may also be used as training samples in data science components , thus creating a two-way synergy between them. Depending on the nature of the model and the requirements of the application, there can be multiple ways of introducing data science outputs in theory-based models.

R. L. Wilby, T. Wigley, D. Conway, P. Jones, B. Hewitson, J. Main,and D. Wilks, “Statistical downscaling of general circulation model output: a comparison of methods,” Water resources research, vol. 34, no. 11, pp. 2995–3008, 1998.

P. Sadowski, D. Fooshee, N. Subrahmanya, and P. Baldi, “Synergies between quantum mechanics and machine learning in reaction prediction,” Journal of Chemical Information and Modeling, vol. 56, no. 11, pp. 2125–2128, 2016.

--

--

Tamires Soares

Tamires is technical advisor and instructor focusing on reservoir/production engineering and data analytics .