Professor Pedro Ferreira: Publications

Euclid preparation. XXXIV. The effect of linear redshift-space distortions in photometric galaxy clustering and its cross-correlation with cosmic shear

(2023)

Authors:

Euclid Collaboration, K Tanidis, VF Cardone, M Martinelli, I Tutusaus, S Camera, N Aghanim, A Amara, S Andreon, N Auricchio, M Baldi, S Bardelli, E Branchini, M Brescia, J Brinchmann, V Capobianco, C Carbone, J Carretero, S Casas, M Castellano, S Cavuoti, A Cimatti, R Cledassou, G Congedo, L Conversi, Y Copin, L Corcione, F Courbin, HM Courtois, A DaSilva, H Degaudenzi, J Dinis, F Dubath, X Dupac, S Dusini, M Farina, S Farrens, S Ferriol, P Fosalba, M Frailis, E Franceschi, M Fumana, S Galeotta, B Garilli, W Gillard, B Gillis, C Giocoli, A Grazian, F Grupp, L Guzzo, SVH Haugan, W Holmes, I Hook, A Hornstrup, K Jahnke, B Joachimi, E Keihanen, S Kermiche, A Kiessling, M Kunz, H Kurki-Suonio, PB Lilje, V Lindholm, I Lloro, E Maiorano, O Mansutti, O Marggraf, K Markovic, N Martinet, F Marulli, R Massey, S Maurogordato, E Medinaceli, S Mei, M Meneghetti, G Meylan, M Moresco, L Moscardini, E Munari, S-M Niemi, C Padilla, S Paltani, F Pasian, K Pedersen, WJ Percival, V Pettorino, S Pires, G Polenta, JE Pollack, M Poncet, LA Popa, F Raison, A Renzi, J Rhodes, G Riccio, E Romelli, M Roncarelli, E Rossetti, R Saglia, D Sapone, B Sartoris, M Schirmer, P Schneider, A Secroun, G Seidel, S Serrano, C Sirignano, G Sirri, L Stanco, P Tallada-Crespí, AN Taylor, I Tereno, R Toledo-Moreo, F Torradeflot, EA Valentijn, L Valenziano, T Vassallo, A Veropalumbo, Y Wang, J Weller, G Zamorani, J Zoubian, E Zucca, A Biviano, A Boucaud, E Bozzo, C Colodro-Conde, D Di Ferdinando, R Farinelli, J Graciá-Carpio, S Marcin, N Mauri, V Scottez, M Tenti, A Tramacere, Y Akrami, V Allevato, C Baccigalupi, A Balaguera-Antolínez, M Ballardini, D Benielli, F Bernardeau, S Borgani, AS Borlaff, C Burigana, R Cabanac, A Cappi, CS Carvalho, G Castignani, T Castro, G Cañas-Herrera, KC Chambers, AR Cooray, J Coupon, A Díaz-Sánchez, S Davini, S delaTorre, G DeLucia, G Desprez, S DiDomizio, H Dole, JA Escartin Vigo, S Escoffier, PG Ferreira, I Ferrero, F Finelli, L Gabarra, J García-Bellido, E Gaztanaga, F Giacomini, G Gozaliasl, H Hildebrandt, S Ilić, JJE Kajava, V Kansal, CC Kirkpatrick, L Legrand, A Loureiro, J Macias-Perez, M Magliocchetti, G Mainetti, R Maoli, CJAP Martins, S Matthew, L Maurin, RB Metcalf, M Migliaccio, P Monaco, G Morgante, S Nadathur, AA Nucita, M Pöntinen, L Patrizii, A Pezzotta, V Popa, D Potter, AG Sánchez, Z Sakr, JA Schewtschenko, A Schneider, M Sereno, P Simon, A Spurio Mancini, J Steinwagner, M Tewes, R Teyssier, S Toft, J Valiviita, M Viel, L Linke

Priors for symbolic regression

GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation Association for Computing Machinery (2023) 2402-2411

Authors:

Deaglan Bartlett, Harry Desmond, Pedro Ferreira

Abstract:

When choosing between competing symbolic models for a data set, a human will naturally prefer the “simpler” expression or the one which more closely resembles equations previously seen in a similar context. This suggests a non-uniform prior on functions, which is, however, rarely considered within a symbolic regression (SR) framework. In this paper we develop methods to incorporate detailed prior information on both functions and their parameters into SR. Our prior on the structure of a function is based on a ngram language model, which is sensitive to the arrangement of operators relative to one another in addition to the frequency of occurrence of each operator. We also develop a formalism based on the Fractional Bayes Factor to treat numerical parameter priors in such a way that models may be fairly compared though the Bayesian evidence, and explicitly compare Bayesian, Minimum Description Length and heuristic methods for model selection. We demonstrate the performance of our priors relative to literature standards on benchmarks and a real-world dataset from the field of cosmology.

Priors for symbolic regression

(2023)

Authors:

Deaglan J Bartlett, Harry Desmond, Pedro G Ferreira

Exhaustive symbolic regression

IEEE Transactions on Evolutionary Computation IEEE (2023)

Authors:

Deaglan Bartlett, Harry Desmond, Pedro Ferreira

Abstract:

Symbolic Regression (SR) algorithms attempt to learn analytic expressions which fit data accurately and in a highly interpretable manner. Conventional SR suffers from two fundamental issues which we address here. First, these methods search the space stochastically (typically using genetic programming) and hence do not necessarily find the best function. Second, the criteria used to select the equation optimally balancing accuracy with simplicity have been variable and subjective. To address these issues we introduce Exhaustive Symbolic Regression (ESR), which systematically and efficiently considers all possible equations—made with a given basis set of operators and up to a specified maximum complexity— and is therefore guaranteed to find the true optimum (if parameters are perfectly optimised) and a complete function ranking subject to these constraints. We implement the minimum description length principle as a rigorous method for combining these preferences into a single objective. To illustrate the power of ESR we apply it to a catalogue of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate as a function of redshift, finding 40 functions (out of 5.2 million trial functions) that fit the data more economically than the Friedmann equation. These low-redshift data therefore do not uniquely prefer the expansion history of the standard model of cosmology. We make our code and full equation sets publicly available.

Analytical marginalization over photometric redshift uncertainties in cosmic shear analyses

Monthly Notices of the Royal Astronomical Society 91̽�� University Press 522:4 (2023) 5037-5048

Authors:

Jaime Ruiz-Zapatero, B Hadzhiyska, David Alonso, Pedro G Ferreira, Carlos García-García, Arrykrishna Mootoovaloo

Abstract:

As the statistical power of imaging surveys grows, it is crucial to account for all systematic uncertainties. This is normally done by constructing a model of these uncertainties and then marginalizing over the additional model parameters. The resulting high dimensionality of the total parameter spaces makes inferring the cosmological parameters significantly more costly using traditional Monte Carlo sampling methods. A particularly relevant example is the redshift distribution, p(⁠z ), of the source samples, which may require tens of parameters to describe fully. However, relatively tight priors can be usually placed on these parameters through calibration of the associated systematics. In this paper, we show, quantitatively, that a linearization of the theoretical prediction with respect to these calibrated systematic parameters allows us to analytically marginalize over these extra parameters, leading to a factor of ∼30 reduction in the time needed for parameter inference, while accurately recovering the same posterior distributions for the cosmological parameters that would be obtained through a full numerical marginalization over 160 p(⁠z ) parameters. We demonstrate that this is feasible not only with current data and current achievable calibration priors but also for future Stage-IV data sets.

91̽��