Training is performed using the `TrainBaumWelch`

function. It takes as arguments a FAR or WFST representing the plaintext, and channel model and a FAR representing the ciphertext.

Training takes a template argument argument representing the desired expectation table type. Two types are available. The state expectation table normalizes so that all arcs leaving a state sum to semiring one. The state/input-label expectation table normalizes so that the weights of all arcs leaving a state with the same input label sum to semiring one; this is arguably the most intuitive normalization strategy and so it used as the default.

Training function is controlled in part by the arc type of the inputs.

In a semiring without the *path property*, like the log semiring, the E-step sums over all state sequences and thus performs true Baum-Welch training. In the case the input is a FST language model with backoffs, attach a $\phi$-matcher (`fstspecial --fst_type phi`

) to ensure proper interpretation.

In a semiring with the path property, like the tropical semiring, the E-step sums over only the most likely state sequence, an approximation known as *Viterbi training* (Brown et al. 1993:293). In the case the input is an FST language model with backoffs, backoffs can be encoded exactly using a $\phi$-matcher or approximated by $\epsilon$ using the default matcher. In practice, Baum-Welch training is slightly slower than Viterbi training, though somewhat more accurate (Jansche 2003).

`TrainBaumWelch`

takes an struct representing options for training. These define the size of the batch, the maximum number of iterations for training, the learning rate $$\alpha$$, and the comparison/quantization $$\delta$$ used to detect convergence.

Since expectation maximization training is only locally optimal, *random (re)starts* can be used to avoid local minima. For particularly difficult problems, it may be beneficial to perform a very large number of random starts (e.g., Berg-Kilpatrick and Klein 2013). In this library, we randomly initialize weights by sampling from the uniform distribution $(\delta, 1]$ in the real numbers and then mapping these values onto the appropriate values in the desired semiring. `RandomizeBaumWelch`

can be used to generate random restarts.

Berg-Kilpatrick, T., and Klein, D. 2013. Decipherment with a million random restarts. In _Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing_, pages 874-878. Seattle.

Brown, P., Della Pietra, S. A., Della Pietra, V. J., and Mercer, R. L. 1993. The
mathematics of statistical machine translation: parameter estimation.
*Computational Linguistics* 19(2): 263-311.

Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. _Journal of the Royal Statistical Society, Series B_, 39(1): 1-38.

Hart, P. E., Nilsson, N. J., Raphael, B. 1968. A formal basis for the heuristic determination of minimal cost paths. _IEEE Transactions on Systems Science and Cybernetics_ 4(2): 100-107.

Jansche, M. 2003. Inference of string mappings for language technology. Doctoral dissertation, Ohio State University.

Knight, K., Nair, A., Rashod, N., and Yamada, K. 2006. Unsupervised analysis for decipherment problems. _Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions_, pages 499-506. Sydney.

Liang, P., and Klein, D. 2009. Online EM for unsupervised models. In _Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics_, pages 611-619. Boulder.

Novak, J. R., Minematsu, N., and Hirose, K. 2012. WFST-based grapheme-to-phoneme conversion: Open source tools for alignment, model-building and decoding. _Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing_, pages 45-49. Donostia–San Sebastián.

Novak, J. R., Minematsu, N., and Hirose, K. 2016. Phonetisaurus: exploring
grapheme-to-phoneme conversion with joint n-gram models in the WFST framework.
*Natural Language Engineering* 22(6): 907-938.

Topic revision: r3 - 2021-05-18 - KyleGorman

Copyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback