9

Resolved

GEDCOM Import of Full ASCII Character Set

description

My GEDCOM file has latin characters and ascii 160, but they are not being imported.

file attachments

comments

cbaillar wrote Jul 18, 2007 at 8:37 AM

All accentuated char are not imported (é, à, è, etc...)

wrote Jul 18, 2007 at 8:37 AM

wrote Oct 12, 2007 at 6:33 PM

wrote Nov 7, 2007 at 7:16 PM

wrote Dec 29, 2007 at 6:28 PM

wrote Jan 6, 2008 at 9:41 AM

aag wrote Jan 6, 2008 at 9:44 AM

Norwegian characters (ÆØÅæøå) are not imported either. Difficult when half of all the names in the GedCom-file I'm importing contain one or more of those...

wrote Mar 10, 2008 at 10:45 AM

wrote Apr 27, 2008 at 6:31 PM

TitoPeru wrote Apr 28, 2008 at 5:13 AM

I think i found a solution, all is about encoding, and there are two points to fix the error.
I downloaded yesterday this great app but found this issue too (Spanish/Peru), sadly foreign people has the same problems with new U.S. oriented programs.
These modifications are in the project FamilyShowLib, file GedcomConverter.cs
The FIRST problem is importing the gedcom file, this must use an Encoding in Streamwriter, i have best results using windows-1252, Western European (Windows). The class: GedcomConverter, method: ConvertToXml, line: about 44 - 45 (depends on version)

Before:
            // Convert each line of the gedcom file to an xml element.
            using (StreamReader sr = new StreamReader(gedcomFilePath))
After:
            Encoding encoding = Encoding.GetEncoding("windows-1252");
            // Convert each line of the gedcom file to an xml element.
            using (StreamReader sr = new StreamReader(gedcomFilePath, encoding))
The SECOND problem is where the app try to clean up the gedcom line to only allow viewable characters, according to code only permits from characters hex 0020 (decimal 32 - "space") to hex 007E (decimal 127 - "~") if you want to allow international characters you need to extend to hex 00FF (at least Latin-1 suplement) maybe more, yeah i inserting hex 007F ("DEL") and hex 0080 to 009F (control character), i try to fix them later. Here is the code, same file, class Gedcomline, line about 164 - 165:
Before:
    // Expression pattern used to clean up the GEDCOM line.
    // Only allow viewable characters.
    private readonly Regex regClean = new Regex(@"[^\x20-\x7e]");
After:
    // Expression pattern used to clean up the GEDCOM line.
    // Only allow viewable characters.
    private readonly Regex regClean = new Regex(@"[^\x20-\xff]");

wrote Oct 18, 2009 at 12:33 PM

elyoh wrote Dec 7, 2009 at 7:06 PM

This will work in version 4 providing the GEDCOM file is encoded in UTF-8 format.

wrote Feb 21, 2013 at 10:59 PM

wrote May 16, 2013 at 10:35 AM

wrote May 16, 2013 at 10:35 AM

wrote Jun 14, 2013 at 7:23 AM