Mixture of experts

Mixture of experts (MoE) refers to a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions.^[1] It differs from ensemble techniques in that typically only a few, or 1, expert model will be run, rather than combining results from all models.

An example from computer vision is combining one neural network model for human detection with another for pose estimation.

Hierarchical mixture

If the output is conditioned on multiple levels of (probabilistic) gating functions, the mixture is called a hierarchical mixture of experts.^[2]

A gating network decides which expert to use for each input region. Learning thus consists of learning the parameters of:

individual learners and
gating network.

Applications

Meta uses MoE in its NLLB-200 system. It uses multiple MoE models that share capacity for use by low-resource language models with relatively little data.^[3]

References

^ Baldacchino, Tara; Cross, Elizabeth J.; Worden, Keith; Rowson, Jennifer (2016). "Variational Bayesian mixture of experts models and sensitivity analysis for nonlinear dynamical systems". Mechanical Systems and Signal Processing. 66–67: 178–200. Bibcode:2016MSSP...66..178B. doi:10.1016/j.ymssp.2015.05.009.
^ Hauskrecht, Milos. "Ensamble methods: Mixtures of experts (Presentation)" (PDF).
^ Rodriguez, Jesus. "🗺 Edge#214: NLLB-200, Meta AI's New Super Model that Achieved New Milestones in Machine Translations Across 200 Languages". thesequence.substack.com. Retrieved 2022-08-04.

Extra reading

Masoudnia, Saeed; Ebrahimpour, Reza (12 May 2012). "Mixture of experts: a literature survey". Artificial Intelligence Review. 42 (2): 275–293. doi:10.1007/s10462-012-9338-y. S2CID 3185688.

[1] Baldacchino, Tara; Cross, Elizabeth J.; Worden, Keith; Rowson, Jennifer (2016). "Variational Bayesian mixture of experts models and sensitivity analysis for nonlinear dynamical systems". Mechanical Systems and Signal Processing. 66–67: 178–200. Bibcode:2016MSSP...66..178B. doi:10.1016/j.ymssp.2015.05.009.

[2] Hauskrecht, Milos. "Ensamble methods: Mixtures of experts (Presentation)" (PDF).

[3] Rodriguez, Jesus. "🗺 Edge#214: NLLB-200, Meta AI's New Super Model that Achieved New Milestones in Machine Translations Across 200 Languages". thesequence.substack.com. Retrieved 2022-08-04.

[1]

[2]

[3]

Mixture of experts

Contents

Hierarchical mixture

Applications

References

Extra reading

Navigation menu