The Locale Explorer: LCMapString
(and a custom-draw CListCtrl with multiple fonts and color, self-adjusting columns, and popups)

Home
Back To Tips Page

Back to LocaleExplorer

The Unicode database

This was a little experiment to use the data from the Unicode Consortium.  The raw data files used to produce these are the files blocks.txt and unicodedata.txt, which can be downloaded from their FTP site at http://www.unicode.org/Public/UNIDATA/.  This material is used in accordance with their guidelines.

Copyright © 1991-2005 Unicode, Inc. All rights reserved.
Certain documents and files on this website contain a legend indicating that "Modification is permitted." Any person is hereby authorized, without fee, to modify such documents and files to create derivative works conforming to the Unicode® Standard, subject to Terms and Conditions herein.
Any person is hereby authorized, without fee, to view, use, reproduce, and distribute all documents and files solely for informational purposes in the creation of products supporting the Unicode Standard, subject to the Terms and Conditions herein.

If you select a range of codes in the upper control and double-click it, the corresponding set of glyphs will appear in the lower control.  The control will show and select the first glyph in the range specified.

The contents of unicodedata.txt consist of a sequence of lines that indicate the nature of the character.  For example, the above glyphs are derived from the lines

0994;BENGALI LETTER AU;Lo;0;L;;;;;N;;;;;
0995;BENGALI LETTER KA;Lo;0;L;;;;;N;;;;;
0996;BENGALI LETTER KHA;Lo;0;L;;;;;N;;;;;
0997;BENGALI LETTER GA;Lo;0;L;;;;;N;;;;;
0998;BENGALI LETTER GHA;Lo;0;L;;;;;N;;;;;
0999;BENGALI LETTER NGA;Lo;0;L;;;;;N;;;;;
099A;BENGALI LETTER CA;Lo;0;L;;;;;N;;;;;
099B;BENGALI LETTER CHA;Lo;0;L;;;;;N;;;;;
099C;BENGALI LETTER JA;Lo;0;L;;;;;N;;;;;
099D;BENGALI LETTER JHA;Lo;0;L;;;;;N;;;;;
099E;BENGALI LETTER NYA;Lo;0;L;;;;;N;;;;;
099F;BENGALI LETTER TTA;Lo;0;L;;;;;N;;;;;
09A0;BENGALI LETTER TTHA;Lo;0;L;;;;;N;;;;;
09A1;BENGALI LETTER DDA;Lo;0;L;;;;;N;;;;;
09A2;BENGALI LETTER DDHA;Lo;0;L;;;;;N;;;;;
09A3;BENGALI LETTER NNA;Lo;0;L;;;;;N;;;;;
09A4;BENGALI LETTER TA;Lo;0;L;;;;;N;;;;;

The complete documentation can be obtained from the file UCD.html, downloadable from the above-cited Unicode site. 

The fields, numbered from 0, are

  1. The Unicode code point, expressed in hexadecimal

  2. The name of the character from the Unicode code charts of the Unicode standard.
  3. The general Unicode category.  There are 30 categories, including
  4. Canonical Combining Class
  5. Bidi Class (Birdirection class)
  6. Decomposition type or Decomposition mapping
  7. Numeric type/Numeric value.  If the character has the decimal digit property, this is the numeric value of that digit.
  8. Digit value: If the character has the digit property, this is the decimal value of the digit.  Example would be a superscript.
  9. Numeric value: If the character has the numeric propertiy, this is the integer or rational number the value represents.
  10. Bidi-Mirrored: "Y" if the character is a mirrored character for bidirectional text, "N" otherwise.
  11. Unicode 1.0 name: The former name of the character in the Unicode 1.0 standard, or the ISO 6429 name for a control character.
  12. ISO Comment: The ISO 10646 comment field
  13. Uppercase mapping: The single-character uppercase mapping.
  14. Lowercase mapping: The single-character lowercase mapping.
  15. Titlecase mapping: The titlecase mapping if different from the uppercase mapping

I have added an extra column which is the glyph for the character position.

To facilitate readability, I did this as an owner-drawn control, and I use the ability to highlight the entire width to make it obvious which character has been selected.  The division into groups of three is also to facilitate legibility, and I chose colors reminiscent of the old green line printer paper so popular in the 1950s through the 1980s, when impact line printers were finally replaced with laser printers.

Double-clicking on an element of the upper list control will select the corresponding range in the lower control.  To do this effectively, I use an adaptation of binary search, the bsearch algorithm.  However, bsearch requires that there be a table in memory; I derive the information from the control itself.  Also, bsearch returns a NULL pointer if an exact match is not made, but in many cases in Unicode, a range is specified but the first, or first few, character positions of the range are not populated.  So I set it up so that it would return me the first element that was within the range clicked.  This was only a minor variation on bsearch.

The Unicode tables are stored as resources and loaded in via FindResource, LoadResource and LockResource API calls.  In studying the code of the module UnicodeData.cpp, note that this resource is stored as 8-bit character data.

Flyover Expansion

The problem with small fonts is that for some of the more complex fonts, the screen resolution is too low to make out all the details.  So I wanted the ability to drag the mouse over a glyph and see an enlarged glyph.  This is shown below.  (This is a snapshot from an older version before the column-fitting options were added.

This was trickier than I first thought. 

I first had to detect what row was under the mouse.  There is no ItemFromPoint call in CListCtrl, so I had to add one.

int CListCtrlEx::ItemFromPoint(CPoint pt)
   {
    CRect r;
    GetClientRect(&r);
    for(int i = GetTopIndex(); i < GetItemCount() ; i++)
       { /* scan elements */
        CRect g;
        GetItemRect(i, g, LVIR_BOUNDS);
        if(g.top > r.bottom)
           { /* past end */
            return -1; // not found
           } /* past end */
        if(g.PtInRect(pt))
           return i;
       } /* scan elements */
    return -1;
   } // CListCtrlEx::ItemFromPoint

Then, in the OnMouseMove handler, I detect if the mouse is in the glyph column.  If it is, I check to see if the little popup window is visible. If it is not visible, I make it visible and then set up to intercept the mouse leaving the window.  If the mouse is already visible, I just move it to follow the cursor, and set its character to the value of the character code of the line.  Since I only support 16-bit Unicode, I do not attempt to handle glyphs whose codes are greater than 0xFFFF.

The OnMouseMove handler calls the ActivateGlyphDisplay function.  To detect the column for the glyph, the parent has to call a function to set the GlyphColumn variable to the column of the glyph.

void CUnicodeDisplay::ActivatePopupDisplay(CPoint point)
   {
    if(GlyphColumn != -1)
       { /* have glyph */
        int i = ItemFromPoint(point);
        if(i < 0)
           { /* not found */
            HideCharacter();
            return; // not in point
           } /* not found */

        // Now further refine it 
        CRect g;
        GetSubItemRectEx(i, GlyphColumn, LVIR_BOUNDS, g);

        if(g.PtInRect(point))
           { /* glyph hit */
            CString glyph = GetItemText(i, GlyphColumn);
            ShowCharacter(point, glyph);
            return;
           } /* glyph hit */

        // ... to be defined, below, for flyover expansions of text

        HideCharacter();
       } /* have glyph */
   } // CUnicodeDisplay::ActivateGlyphDisplay

The ShowCharacter function creates the window if it does not exist, and shows it.

void CUnicodeDisplay::ShowCharacter(CPoint point, const CString & glyph)
   {
    if(glyph.IsEmpty())
       return;
    if(!PreShowPopup(point))
       return;
    popup.SetChar(glyph[0]); // no effect if character is the one that is defined

     CFont * f = GetFont();
     ASSERT(f != NULL);

     // At this point, we know we have a valid glyph window, and a
     // valid glyph
     popup.SetText(text); // no effect if character is the one that is defined
     popup.SetStyle(CShowGlyph::Text, f);

     PostShowPopup(point);
   } // CUnicodeDisplay::ShowCharacter

Utility functions used by this code and the flyover expansion (described below) are

BOOL CUnicodeDisplay::PreShowPopup(CPoint point)
    {
     if(popup.GetSafeHwnd() == NULL)
        { /* create popup */
         if(!popup.Create(0, 0, this))
            return FALSE; // error
         visible = FALSE;
        } /* create popup */
     return TRUE;
    } // CUnicodeDisplay::PreShowPopup

void CUnicodeDisplay::PostShowPopup(CPoint point)
    {
     CPoint pt = point;
     pt.x -= ::GetSystemMetrics(SM_CXCURSOR);
     pt.y += ::GetSystemMetrics(SM_CYCURSOR);
     ClientToScreen(&pt);

     if(!visible)
        { /* need to show */
         popup.ShowWindow(SW_SHOWNA); // in case it isn't already shown
         TRACKMOUSEEVENT tme = {sizeof(TRACKMOUSEEVENT), TME_LEAVE, m_hWnd, 0};
         popup.SetWindowPos(NULL, pt.x, pt.y, 0, 0, SWP_NOSIZE | SWP_NOZORDER | SWP_NOACTIVATE);
         visible = TRUE;
         VERIFY(TrackMouseEvent(&tme));
        } /* need to show */
     else
        { /* just move it */
         popup.SetWindowPos(NULL, pt.x, pt.y, 0, 0, SWP_NOSIZE | SWP_NOZORDER | SWP_NOACTIVATE);
        } /* just move it */
    } // CUnicodeDisplay::PostShowPopup

The corresponding HideCharacter method is much simpler:

void CUnicodeDisplay::HideCharacter()
   {
    if(!visible)
       return;
    if(popup.GetSafeHwnd() == NULL)
       return;

    popup.ShowWindow(SW_HIDE); // in case it was showing
    visible = FALSE;
   } // CUnicodeDisplay::HideCharacter

The mouse detection code is

void CUnicodeDisplay::OnMouseMove(UINT nFlags, CPoint point) 
   {
    ActivatePopupDisplay(point);
    CListCtrlEx::OnMouseMove(nFlags, point);
   }

LRESULT CUnicodeDisplay::OnMouseLeave(WPARAM, LPARAM)
   {
    HideCharacter();
    return 0;
   } // CUnicodeDisplay::OnMouseLeave

Note that in VS6 it was necessary to add the OnMouseLeave by hand, since the ClassWizard does not support this message.

There are some little optimizations, such as the fact that the SetChar method checks to see if the character it is setting is the same as the existing character, and if so, just returns; this reduces flicker as the mouse moves around. Note the need to use SWP_NOACTIVATE, because otherwise there is an annoying flicker in the application, and the focus changes, which can have other deleterious effects on the behavior.

There was an additional problem: if the display is scrolled, for example, by the use of the mouse wheel or by use of arrow keys, the popup remained the same, even though the character under the mouse cursor has changed.  I wanted the character to track correctly.  The only solution I came up with involved sending a notification to the parent  The Boolean visible is set in ShowCharacter and HideCharacter. The DrawItem handler tracks the current top index position, and if it changes, it

     int newtop = GetTopIndex();
     if(newtop != oldtop)
        { /* need update */
         if(visible)
            PostMessage(UWM_UPDATE_NEEDED);
         oldtop = newtop;
        } /* need update */

This illustrates one of the many problems with the CListCtrl: it would have made a lot of sense to have an LVN_BEGINSCROLL, LVN_ENDSCROLL and LVN_SCROLL set of messages sent to the parent to alert the parent about changes.  But like most of the useful facilities that would have been required, this was omitted.

Given that I do column fitting, which means that some columns will be adjusted to be smaller than their contents, it would be nice to see the contents.  Some controls, such as tree controls, have a feature that if you move the mouse over some content that is obscured by the edge of the screen, a popup will appear which has the contents.

This was done by extended the ActivatePopupDisplay code by adding the code below

         LVCOLUMN data = {LVCF_WIDTH};
         CClientDC dc(this);
         CFont * f = GetFont();
         dc.SelectObject(f);
         for(int col = 0; GetColumn(col, &data); col++)
            { /* scan columns */
             CRect r;
             if(!GetSubItemRectEx(i, col, LVIR_BOUNDS, r))
                return;
             if(!r.PtInRect(point))
                continue;
             // Otherwise, see if we are in something that requires flyover help
             int w = data.cx;
             if(w  > 0)
                { /* has real column */
                 CString text = GetItemText(i, col);
                 CSize sz = dc.GetTextExtent(text);
                 if(sz.cx > w)
                    { /* show text */
                     ShowText(point, text);
                     return;
                    } /* show text */
                } /* has real column */
            } /* scan columns */

which uses the function ShowText. In the above code, Note the use of a method of my CListCtrlEx subclass,  GetSubItemRectEx, because column 0 gives undocumented and erroneous results for GetSubItemRect.

void CUnicodeDisplay::ShowText(CPoint point, const CString & text)
    {
     if(!PreShowPopup(point))
        return;

     CFont * f = GetFont();
     ASSERT(f != NULL);

     // At this point, we know we have a valid glyph window, and a
     // valid glyph
     popup.SetText(text); // no effect if character is the one that is defined
     popup.SetStyle(CShowGlyph::Text, f);

     PostShowPopup(point);
    } // CUnicodeDisplay::ShowText

Column Fitting

One of the problems has been that the column widths are set while the text is being loaded.  This means that some of the really interesting information is far off the window, and requires horizontal scrolling.  It should be understood that horizontal scrolling often represents the very worst possible user interface, and should be avoided whenever possible.  Clearly, if all the information is to be displayed, this is not possible.

But what if we display only part of the information.  While the longest strings can be very long, most strings describing characters are very short.  And some columns are actually very uninteresting, such as the original Unicode 1.0 name.  So I came up with an algorithm that would try to "compress" the columns.  The idea was to attach to various columns a "compression factor" that would influence how much they were compressed.  I set the Unicode 1.0 Name compression to Zero, and ultimately fiddled with the parameters so that most columns remained Fixed (meaning their size would not be changed) while others had compression factors from the set {LowCompression, MediumCompression, HighCompression, SuperHighCompression}.  Some fiddling with the parameters associated with these compression ratios came up with a satisfactory result.  So the option of Best Fit displays the text as shown below.  The goal of Best Fit is to fit everything in without requiring horizontal scrolling, at the sacrifice of some of the text; for example, the "Name" text for codes 06D6 and 06D7 is cut off.

This leads to the question of how, in an owner-draw list control, I was able to keep the text from spilling into adjacent columns.  To do this, before drawing each element, I set a clipping region:.  The code below, excerpted from CUnicodeDisplay::DrawItem, summarizes that action (there's a lot more going on in this loop, which has been elided, such as selecting the display colors, computing additional positioning information, etc.)

     for(int col = 0; GetColumn(col, &colinfo); col++)
        { /* scan columns */
         CRect r;
         GetSubItemRectEx(dis->itemID, col, LVIR_BOUNDS, r);
         CRect clip(r.left, r.top, r.right + 1, r.bottom + 1);

         CRgn cliprgn;
         cliprgn.CreateRectRgn(clip.left, clip.top, clip.right, clip.bottom);
         dc->SelectClipRgn(&cliprgn);

         //----------------------------------------------------------------
         CString s = GetItemText(dis->itemID, col);
         dc->TextOut(x, dis->rcItem.top, s);

The +1 computations are required because the nature of Windows graphics is up-to-but-not-including the endpoint, and for the clipping, I want to include the endpoint.

Finding interesting items

A number of columns are empty.  For someone who is curious, it would be interesting to discover what characters actually have non-empty information in those columns.  So I chose to make the header buttons active.  Clicking on a button will position the selection on the next non-blank entry in that column.  This led to some interesting problems processing the HDN_ITEMCLICK message.

Filling up takes time

Loading the control takes a long time.  At least a full minute.  This produces an unpleasant delay during startup.  So what I did was take advantage of "idle time".  Normally, I would do this from a separate thread.  But in this case, because I'm filling up a control, trying to fill up a control from a thread would lead to cross-thread SendMessage calls, definitely a Bad Thing. But I didn't want to do it "inline" during ;the startup, particularly because the user might not actually want to look at the Unicode character properties and therefore this would merely waste time.

I adapted an idea from another project: the use of an I/O Completion Port as an interthread message queue.  In this case, however, what I did was set it up so that it was an intra-thread message queue.  The way this works is that when it comes time to fill up the control with Unicode character data, I just did a PostMessage of a user-defined message that suggested doing 10 Unicode characters.  When this was dequeued, it would fill in 10 characters, and then if there were any characters left to fill in, it would do another PostMessage to handle this.  So the work was done while the user was idle; if the user were interacting with the program, then the user actions would take precedence.

This also meant that this page was not completed immediately; instead, it might not be filled in for a while.  This could lead to all sorts of confusion.  To avoid this, I simply disable the controls until the Unicode information display is completed.  If the user should switch to the Unicode page while the control is still being filled, the control will appear as shown below.  Some column headings are missing because the computation is not yet finished and therefore the columns are too narrow to hold the headings, which causes them to be omitted.  When the computations are completed, the heading text widths will also be computed.

Problems with CListCtrl

The CListCtrl appears to be more like an accident than the result of design. There are a huge number of defects in it, some of which are described in my essay on making a vertical-text header control. But there are numerous other defects in the control.

GetSubItemRect will not work correctly if the subitem is 0.  Why?  Who knows?  But when GetSubItemRect is applied to column 0, it gives the entire item rectangle!  Apparently the concept that I might actually want the subitem rectangle for item 0 was not considered; why it would give anomalous behavior (when there is a perfectly function GetItemRect that would produce the same result) is not explained.  Furthermore, this anomalous behavior is not documented.  It's bad enough that it works wrong, but at least some effort could have been made to document the anomaly.

Now, I really, really need the rectangle of the first subitem.  So what I did was add a GetSubItemRectEx method to my CListCtrlEx class:

BOOL CListCtrlEx::GetSubItemRectEx(int item, int subitem, int area, CRect & r)
    {
     if(subitem > 0)
        return GetSubItemRect(item, subitem, area, r);

     if(!GetSubItemRect(item, subitem, area, r))
        return FALSE; // failed
     CRect r2;
     if(!GetSubItemRect(item, subitem + 1, area, r))
        return TRUE;  // only column full column is width of column 0
     r.right = r2.left;
     return TRUE;
    } // CListCtrlEx::GetSubItemRectEx/

If there is only one column then the GetSubItemRect call on subitem + 1 will fail.  So the default case where an attempt to GetSubItemRect of a one-column display simply uses the width of the entire item.  It is not clear why such a grotesque workaround should be required, when it would have made more sense to have made GetSubItemRect work right?  A case of slapping a control together without any design effort, as far as I can tell. Of course, now, for backward compatibility with code that already depends upon the bug, it can't be changed, but a solution such as mine, where a new method is added which actually does work correctly, should be added.

Now, if there is horizontal scrolling going on, an additional wrinkle is required.  Various coordinates have to be adjusted by the amount of scrolling.  This is also undocumented.  I discovered this by a lot of lengthy tracing and twiddling to get it right.  To compute the leftmost edge of the text required for a TextOut, I have to do

         int offset = -GetScrollPos(SB_HORZ);

I want to honor, in my owner-draw logic, the specified column alignment.  The code here appears to be straightforward, but note the appearance of the offset value to handle horizontal scrolling.  In addition, in case the next column is left-aligned, I don't want the right-justified text snugged up right against the next column, so I compute an arbitrary gap, arbitrarily chosen to be a small distance.  The background has been written out to the full rectangle, but I want the text to be clipped.

          int gap = 2 * ::GetSystemMetrics(SM_CXBORDER); 

          switch(colinfo.fmt & LVCFMT_JUSTIFYMASK)
            { /* fmt */
             case LVCFMT_CENTER:
                x = dis->rcItem.left + pos + width / 2 - offset;
                dc->SetTextAlign(TA_CENTER);
                dtformat = DT_CENTER;
                break;
             case LVCFMT_RIGHT:
                x = dis->rcItem.left + pos + width - offset - gap;
                dc->SetTextAlign(TA_RIGHT);
                dtformat = DT_RIGHT;
                break;
             case LVCFMT_LEFT:
             default:
                x = dis->rcItem.left + pos - offset;
                dc->SetTextAlign(TA_LEFT);
                dtformat = DT_LEFT;
                break;
            } /* fmt */

So that I don't overwrite the text in adjoining columns, I then create a clipping region.  Because I don't want the pixels to run smash-into-the-next-column, I create a clipping region that is smaller by the same gap computed above

         CRect clip(r.left, r.top, r.right - gap, r.bottom + 1);

         CRgn cliprgn;
         cliprgn.CreateRectRgn(clip.left, clip.top, clip.right, clip.bottom);
         dc->SelectClipRgn(&cliprgn);

Handling owner-draw issues: Bugs in DrawText

Now, you would think that using DrawText would be easy, but in fact it does not work properly.  For example, when I used it to output some right-justified numbers, using DT_RIGHT, but instead I got the behavior shown below.  I thought it might be the DT_WORD_ELLIPSIS flag, but careful single-stepping shows that with or without the ellipsis option, the right edge is computed correctly, but the text appears incorrectly.

                      8
                      9
                    10
                    12
                    50
                  100
                  500
                1000
                1000
                5000
              10000

The characters appear to be offset from the right margin by the width of the string, so the setting of the dtformat variable appears to be largely irrelevant in the light of the additional code I added; instead of passing it as a parameter to DrawText, I use it to determine that for non-left alignment, I have to use TextOut.  I also discovered that trying to use DrawText even in the first column, left-justified, appears to give erroneous display information.  So I reverted to use TextOut for the first column as well.

         CString s = GetItemText(dis->itemID, col);
         CRect dtrect = clip;
         dtrect.left += offset;
         dtrect.right += offset;
         
         if(col > 0 && dtformat == DT_LEFT)
            dc->DrawText(s, &dtrect, dtformat | DT_WORD_ELLIPSIS);
         else
            dc->TextOut(x, dis->rcItem.top, s);

         pos += width;
        } /* scan columns */

[Dividing Line Image]

The views expressed in these essays are those of the author, and in no way represent, nor are they endorsed by, Microsoft.

Send mail to newcomer@flounder.com with questions or comments about this web site.
Copyright © 2005 Joseph M. Newcomer/FlounderCraft Ltd.  All Rights Reserved.
Last modified: April 04, 2006