Skip to content
  • Mike FABIAN's avatar
    a7b5eb82
    Update to Unicode 16.0.0 [BZ #32168] · a7b5eb82
    Mike FABIAN authored
    
    
    Unicode 16.0.0 Support: Character encoding, character type info, and
    transliteration tables are all updated to Unicode 16.0.0, using
    the generator scripts contributed by Mike FABIAN (Red Hat).
    
    Changes in CHARMAP and WIDTH:
    
        Total added characters in newly generated CHARMAP: 5185
        Total removed characters in newly generated WIDTH: 1
        Total added characters in newly generated WIDTH: 170
    
    The removed character from WIDTH is U+1171E AHOM CONSONANT SIGN MEDIAL RA.
    It changed like this:
    
    UnicodeData.txt 15.1.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mn;0;NSM;;;;;N;;;;;
    UnicodeData.txt 16.0.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mc;0;L;;;;;N;;;;;
    
    EastAsianWidth.txt 15.1.0: 1171D..1171F   ; N  # Mn     [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
    EastAsianWidth.txt 16.0.0: 1171E          ; N  # Mc         AHOM CONSONANT SIGN MEDIAL RA
    
    I.e it changed from Mn (Mark Nonspacing) to Mc (Mark Spacing
    combining). So it should now have width 1 instead of 0, therefore it
    is OK that it was removed from WIDTH, characters not in WIDTH get
    width 1 by default.
    
    Nothing suspicious when browsing the list of the 170 added characters.
    
    Changes in ctype:
    
        alpha: Added 4452 characters in new ctype which were not in old ctype
        combining: Added 51 characters in new ctype which were not in old ctype
        combining_level3: Added 43 characters in new ctype which were not in old ctype
        graph: Added 5185 characters in new ctype which were not in old ctype
        lower: Added 25 characters in new ctype which were not in old ctype
        print: Added 5185 characters in new ctype which were not in old ctype
        punct: Missing 33 characters of old ctype in new ctype
        punct: Added 766 characters in new ctype which were not in old ctype
        tolower: Added 27 characters in new ctype which were not in old ctype
        totitle: Added 27 characters in new ctype which were not in old ctype
        toupper: Added 27 characters in new ctype which were not in old ctype
        upper: Added 27 characters in new ctype which were not in old ctype
    
    Nothing suspicous in the additions.
    
    About the 33 characters removed from `punct`:
    
    U+0363 - U+036F are identical in UnicodeData.txt. Difference in DerivedCoreProperties.txt:
    
    DerivedCoreProperties.txt 15.1.0: not there.
    DerivedCoreProperties.txt 16.0.0: 0363..036F    ; Alphabetic # Mn  [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X
    
    So that’s the reason why they are added to `alpha` and removed from `punct`.
    
    Same for U+1DD3 - U+1DE6, they are identical in UnicodeData.txt but there is a difference in DerivedCoreProperties.txt:
    
    DerivedCoreProperties.txt 15.1.0: 1DE7..1DF4    ; Alphabetic # Mn  [14] COMBINING LATIN SMALL LETTER ALPHA..COMBINING LATIN SMALL LETTER U WITH DIAERESIS
    DerivedCoreProperties.txt 16.0.0: 1DD3..1DF4    ; Alphabetic # Mn  [34] COMBINING LATIN SMALL LETTER FLATTENED OPEN A ABOVE..COMBINING LATIN SMALL LETTER U WITH DIAERESIS
    
    So they became `Alphabetic` and were thus added to `alpha` and removed from `punct`.
    
    Resolves: BZ #32168
    
    Reviewed-by: default avatarCarlos O'Donell <carlos@redhat.com>
    a7b5eb82
    Update to Unicode 16.0.0 [BZ #32168]
    Mike FABIAN authored
    
    
    Unicode 16.0.0 Support: Character encoding, character type info, and
    transliteration tables are all updated to Unicode 16.0.0, using
    the generator scripts contributed by Mike FABIAN (Red Hat).
    
    Changes in CHARMAP and WIDTH:
    
        Total added characters in newly generated CHARMAP: 5185
        Total removed characters in newly generated WIDTH: 1
        Total added characters in newly generated WIDTH: 170
    
    The removed character from WIDTH is U+1171E AHOM CONSONANT SIGN MEDIAL RA.
    It changed like this:
    
    UnicodeData.txt 15.1.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mn;0;NSM;;;;;N;;;;;
    UnicodeData.txt 16.0.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mc;0;L;;;;;N;;;;;
    
    EastAsianWidth.txt 15.1.0: 1171D..1171F   ; N  # Mn     [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
    EastAsianWidth.txt 16.0.0: 1171E          ; N  # Mc         AHOM CONSONANT SIGN MEDIAL RA
    
    I.e it changed from Mn (Mark Nonspacing) to Mc (Mark Spacing
    combining). So it should now have width 1 instead of 0, therefore it
    is OK that it was removed from WIDTH, characters not in WIDTH get
    width 1 by default.
    
    Nothing suspicious when browsing the list of the 170 added characters.
    
    Changes in ctype:
    
        alpha: Added 4452 characters in new ctype which were not in old ctype
        combining: Added 51 characters in new ctype which were not in old ctype
        combining_level3: Added 43 characters in new ctype which were not in old ctype
        graph: Added 5185 characters in new ctype which were not in old ctype
        lower: Added 25 characters in new ctype which were not in old ctype
        print: Added 5185 characters in new ctype which were not in old ctype
        punct: Missing 33 characters of old ctype in new ctype
        punct: Added 766 characters in new ctype which were not in old ctype
        tolower: Added 27 characters in new ctype which were not in old ctype
        totitle: Added 27 characters in new ctype which were not in old ctype
        toupper: Added 27 characters in new ctype which were not in old ctype
        upper: Added 27 characters in new ctype which were not in old ctype
    
    Nothing suspicous in the additions.
    
    About the 33 characters removed from `punct`:
    
    U+0363 - U+036F are identical in UnicodeData.txt. Difference in DerivedCoreProperties.txt:
    
    DerivedCoreProperties.txt 15.1.0: not there.
    DerivedCoreProperties.txt 16.0.0: 0363..036F    ; Alphabetic # Mn  [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X
    
    So that’s the reason why they are added to `alpha` and removed from `punct`.
    
    Same for U+1DD3 - U+1DE6, they are identical in UnicodeData.txt but there is a difference in DerivedCoreProperties.txt:
    
    DerivedCoreProperties.txt 15.1.0: 1DE7..1DF4    ; Alphabetic # Mn  [14] COMBINING LATIN SMALL LETTER ALPHA..COMBINING LATIN SMALL LETTER U WITH DIAERESIS
    DerivedCoreProperties.txt 16.0.0: 1DD3..1DF4    ; Alphabetic # Mn  [34] COMBINING LATIN SMALL LETTER FLATTENED OPEN A ABOVE..COMBINING LATIN SMALL LETTER U WITH DIAERESIS
    
    So they became `Alphabetic` and were thus added to `alpha` and removed from `punct`.
    
    Resolves: BZ #32168
    
    Reviewed-by: default avatarCarlos O'Donell <carlos@redhat.com>
Loading