James Kingston Clarke 🦎

The python unicodedata.normalize("NFC", "Pålegg") should be used to normalize unicode strings which can look the same, but have different codpoints. The point is to ensure that all characters that ‘appear the same’, are converted to a common unicode character.

For example, ‘å’ could refer to the character U+00E5, or it could be an a codepoint followed by an ˚ codepoint (U+00E5, U+02DA).

References

https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize
https://www.compart.com/en/unicode/U+00E5
https://www.compart.com/en/unicode/U+02DA