DNA Raw Data to Gedmatch

I noticed a post today about auDNA Raw Data File upload to GEDMatch. The comment that struck me was the idea that people, in general, are nervous, overwhelmed, uncomfortable with the process of downloading their raw DNA data from their testing company and uploading to GEDmatch.

Well, to calm those nerves – we aren’t talking about brain surgery. Not talking about a 120 story tight rope walk. We are not talking about a trip to Mars.

Ir’s just downloading a file to your computer, then uploading the file to GEDmatch. It is exciting, there is no denying that. First time working with DNA results is incredibly exciting. You do all the file portation and in 8 to 24 hours you are connected to people from ALL the Genealogy Testing Companies – not just the company you tested with.

The Process

Get your DNA Tested for Genealogy

No you can’t upload a paternity test using DNA to a Genealogical Testing Site or to GEDmatch. Get a DNA test from one of the Genealogical DNA testing Companies:

FTDNA Family Finder 
AncestryDNA
23andMe *
MyHeritageDNA

You can transfer from other testing companies, like LivingDNA, but until GEDmatch gets the Genesis database merged into the main database you may miss many, many matching opportunities with Genesis.

“23andMe is now using the GSA chip for their new V5 raw DNA file results. This format is not compatible with the regular GEDmatch upload, but can be used with the GEDmatch Genesis upload.”

Register at GEDmatch

Register for a GEDmatch Account

This one is easy AND you can protect your privacy by providing an Alias. Though I am not all that fond of Aliases. One of the first things I do when searching for matches is scan the one-to-many result for a kit to see if any of the known surnames appear in the list (this is easy using your browsers “find” feature). An initial (any initial) and LNAB (last name at birth) can be enough to protect privacy (in my opinion). 

Download your Raw Data File to Your Computer

Here are the links to directions for downloading your Raw Data File:

FamiyTreeDNA Family Finder – Build 36 Raw Concatenated
AncestryDNA
23andMe
My Heritage
LivingDNA

You can download your raw Data from other companies and upload them into GEDmatch Genesis – Google it – “Download my raw data from _____.”

Make sure you know where the file ends up on your computer. When you download the file make sure it goes to your desktop or downloads folder. If you download it and have no idea how to find the downloaded file, then the anxiety can kick-in. If you can’t find it go back to your browser and click on Downloads in the browser to see where the file might have ended up.

Upload your Raw Data file from your computer to GEDmatch.

GEDmatch Notes
Gedmatch Communicates about current information on your profile page.

GEDmatch posts pertinent information about it’s site for users at the top of your profile page. Note the information about the 23andMe chipset and it working in Genesis?

GEDmatch upload link (GEDCOM upload Link too).Once you are on your Profile page you will see the above box on the right of your page. Click on the Generic upload and it will take you to:

GEDmatch Upload Instructions

Upload For FTDNA
Upload For Ancestry
Upload For MyHeritage
Upload Generic (this includes 23andMe and LivingDNA and more)

You’ll get your GEDmatch ID on the Screen at the end of the upload – Write it Down and share it if you are really interested in finding genetic cousins.

It is not all that hard and shouldn’t be anxiety producing. I would equate the feeling of joy with uploading your Raw DNA Data file to Gedmatch. But then again I am such a DNA geek…

WikiTree LiveCast – WikiTree and third Party DNA Sites

On Saturday May 27th at 3:00PM EDT, please join me (Mags),  WikiTree Leader Peter Roberts, DNA Project Coordinator Emma MacBeath, and Julie Ricketts for a live chat on “WikiTree and DNA – Third Party DNA Sites.

This is the sixth in the WikiTree and DNA Series and goes along with some of the changes going on with the DNA Project. Join the chat to ask us DNA Features questions.

We will also be running down the Saturday Sourcing Sprint numbers and WikiPeeps who are involved in the sprints.  

Pull up a chair to watch or ask questions in the LiveCast chat, either way we promise an hour of WikiTree fun! If you want to see a complete list of past and future LiveCasts click the graphic below or follow this link

Mags

P.S. Do you have someone you would like us to interview? Post some answers with your picks for LiveCast Guests – it can even be yourself!

Betty Jean’s Adoption Search – DNA, Finding Patterns

Finding the patterns in genetic genealogy research is really a fundamental thing we do when looking for the clues to our roots. Even in traditional genealogy, looking at a pedigree chart reveals patterns in geographical locations, dates and names which help our research. Looking for patterns is not new, even if we don’t realize we do it.

How do we find the patterns are we looking for? Spreadsheets?

Betty Jean’s Raw Data is file loaded to GEDmatch. If we do a One to Many matches analysis we can capture the entire results list via cut and paste, and insert it into a spreadsheet. We can grab similar results lists from most of the DNA testing or results companies. Getting all the formats similar in a spreadsheet takes some juggling and tweaking but it’s worth it. It is made easier by learning how to create things like separate first, middle, maiden and last name entries by converting text to columns and using filters to clean-up the columns. Sorting is a whiz, once you have all the information in a sortable state.

Sort your spreadsheets by Chromosome, Segment Locations and Last Name and you have a pretty clear view to the people on the sheets who share your DNA and where they share it.

Who do your Matches Belong to? What Familial Surname?

Don’t get that? Who is a matches MCRA (Most common Recent Ancestor) in common with you? Some of the files you download from the various companies give you common surnames. Well that helps, doesn’t it? Sometimes? They also show how many Generations back you might share one of those surnames (some people have 50 or more Surnames). That’s it. That is the answer!

Try adding each of those surnames to a sheet individually or using the text to columns conversion and…

Family Trees and Pedigree Charts

Aside from adding columns upon columns of surnames to your spread sheet there really isn’t any way (that I know of) to add a pedigree or family tree to a spreadsheet – it might be doable, but…I don’t have enough finger tips or time for that.

There are great places where you can upload your GEDCOM to a DNA testing or analysis site, but the DNA isn’t in any way correlated with the tree. It’s just there and you have to use your brain and knowledge of what you are working on to make any sense of it. ESPECIALLY if it’s further back than a generation or so.

But I have something…

I was at a conference in the spring of 2016 where some of the current icons of Genetic Genealogy were a part of a Panel Discussion on the future of Genetic Genealogy. Something brought up by one of the Panelists was that we don’t really have anywhere to make the connection between a World Family Tree and DNA.

Small Rant?

I was a bit shocked and dismayed that not one of these Genetic Genealogy Icons brought up WikiTree. WikiTree, where genealogists collaborate on a true, single, world family tree. WikiTree, where I, you, them, anyone can add all current and future DNA test’s and have the test information auto-populate every single ancestor with that test information. For auDNA tests, back to at least our 64-4th great grandparents. For Y and mtDNA tests back into the depths of our shared pedigree. WikiTree even maps XDNA for it’s DNA tested members! WikiTree, where if something happens (in so many different scenarios), will carry on with nothing more than a hiccup for ever and ever – really. Not ONE mention.

What I use daily in my work is the WikiTree DNA Sandbox.
DNA Sandbox
WikiTree DNA Sandbox

This is where I start looking for patterns that aren’t obvious or easy to correlate anywhere else.

Take GEDmatch and it’s GEDCOM + DNA tool. I can scroll down a list of people to see the pedigrees of my matches. Once I find a Pedigree that matches, I run a One to One comparison. Then I cut and paste the One to One Match comparison information to a section for the match into the DNA Sandbox.

The section titles show the match name and shared chromosome numbers.  If I continue this process over time it will start to reveal patterns:

With this view I can start to make connections between specific Chromosomes and Familial Surnames. It will also show outliers – matches who probably don’t belong to a specific familial surname group even though at first blush they may appear to belong. Try working on the Smith family of NY and see how many matches with the last name Smith are outliers to YOUR Smith family.

Partial Table of Contents for DNA Sandbox
Partial view of the table of Contents for DNA Sandbox

In the table of contents for the DNA Sandbox you can get a peak into those patterns. Take the mtDNA Matches. Obviously matches 3.1.3 to 3.1.5 need some further looking into as do the paternal haplogroup matches 4.1.3-5.1.5. Since these DNA matches posted their mtDNA and YDNA haplogroups information in their auDNA results on GEDMatch we are able to see right off the bat, from the title sections, that they share Chromosome 15. Do they share overlapping segments? A quick look at the meat of the information and…

Information showing segment locations and generations
Information showing segment locations and generations

Yes two do. Granted, they are distant connections – 5.1 generations to 6.1 generations – but they do overlap. If I can figure out the MCRA and add a familial surname to this grouping? It’s a HUGE step toward finding more matches that share Chromosome 15 with you who also are in this Familial Surname grouping.

Betty Jean’s DNA Sandbox

Betty Jean? Where is she in all this? Well, back in the Spring when we started her search for her birth family, I started her WikiTree DNA Sandbox.

Bit and Pieces become patterns

Working steadily with small bits and pieces of data from different testing companies, I pasted data into her sandbox. It started with her highest DNA match on 23andme, her first cousin once removed, who is also an adoptee. We’ll call her Pat (she is very much still in the midst of figuring out her own identity and dealing with the emotional roller coaster that comes with finding ones birth family).

Pat’s information wasn’t sitting alone in the sandbox for long. 12 of Betty Jean’s top 15 matches belonged to the same family – the Howard’s and the Brotherton’s. All these people had either had their DNA tested on their own or were prodded by Jane to get their tests done to help in finding Pat’s birth family. Lucky Betty Jean again, having Jane in her corner.

Patterns Emerge

So within a few weeks, adding Betty Jean’s one to one matches, researching the pedigrees and using the number of cM (centimorgens) – “In genetics, a centimorgan (abbreviated cM) or map unit (m.u.) is a unit for measuring genetic linkage. It is defined as the distance between chromosome positions (also termed loci or markers) for which the expected average number of intervening chromosomal crossovers in a single generation is 0.01.) WikiPedia
 – and the generations estimate, her sandbox began to show patterns. Surnames began to have specific chromosomes connected to them.

As Jane and I worked, I compiled a list of Surnames for Betty. Surnames to use and Surnames to discount. If a pedigree or tree leads to one of the discounted Surnames? Then attention can be focused elsewhere. The list, added to the Sandbox, includes links within WikiTree to the MCRA for a specific Surname in the line. With the sandbox filling up, jumping around the great big ole shared tree with ease, working WikiTree’s relationship tools as well as the DNA tools, I was finding answers in a flash.

Fully Customize the sandbox to your level of expertise and knowledge

And there is no hard and fast rule about what goes into the sandbox. Some have graphics with triangulated groups. Some have Haplogroup information by Surname. All have the ability to make finding answers in the DNA connected world tree that is WikiTree an easier thing to do.

Now, after this VERY long post, I need to go find some patterns in nature for a while. Does Blueberry pie get cold FAST in -2 degree weather? an experiment I must try.

 

Betty Jeans Adoption Search – The DNA

Betty Jean had her DNA tested with 23andMe in an attempt to find out if she had any medical issues which she may have passed along to her children. Along with her health test, she was also was in 23andMe’s Genetic pool of genes. Having her genes in DNA gene Pools will help us in her adoption search.

23andMe

On first look, Betty Jean’s information included some fairly close cousin’s. The closest was a predicted 2nd cousin sharing 1.76% of their DNA. There were 12, 2nd to fourth cousin matches. I sent notes to all of them via 23andMe’s internal messaging system.

I also took some time to look to see if there were any common surnames in these matches. There were – Brotherton and Howard. At the time 23andMe had no DNA analytical tools, so I immediately downloaded Betty Jean’s raw DNA Data file (to download a DNA Data File from 23andMe see this help information) and uploaded it to GEDmatch (you must register for GEDmatch to be able to upload) via the Generic Upload Fast New, Beta.

NOTE: 23andMe has recently added DNA analysis tools which lets it’s users do chromosome mapping and comparisons to other matches. This is great news for anyone who has their DNA tested with 23andMe. It does not preclude a tester from uploading data to GEDmatch, because a tester would want to have their DNA in GEDmatch’s large Gene pool along with people (anyone who uploaded their raw data to GEDmatch) from all the testing companies.

Betty Jean’s GEDmatch Matches

After uploading Betty Jean’s Raw data file from 23andMe we found Betty Jean’s genes swimming in the pool with many of her close cousins – the big one was a 1st cousin once removed at 342.8 total cM.

GEDmatch Matches
Betty Jean’s top GEDmatch matches.

What do we know from this list? Not much for this search since we don’t have any family line we can identify in the matches at face value, without being able to correlate the information with her matches family trees. GEDmatch does have a GEDCOM upload function, but not many of Betty Jean’s matches had their family trees on GEDmatch.

Gathering Family Trees

Again, using the emails for the matches on GEDMatch I sent emails explaining that Betty Jean Matched them and asking if they have family tree’s online or available for access in some other way. I also contacted Jane to discuss the matches. Jane and I spent a bit of time exchanging emails and connecting the dots of Betty Jean’s matches to Jane’s tree.

Remember the 20 foot tree I printed of Jane’s Ancestry Tree? At first I started trying to jump around that monster to mark where the matches landed in the tree. It was cumbersome and frustrating and I had to come up with a better way to be able to see ALL of it at once, and…

There was one more thing about Jane’s tree that needed some space to work-out. It seemed from a quick scan that the Howard and Brotherton lines, as well as other lines that married into them, were a product of Endogamy.

Endogemy Defintition
Google Search

Endogamy is not uncommon in the US colonies as our social spheres were limited by small communities and vast distance between them. This occurred in Appalachia to an extent that one often hears jokes about “my cousin is my wife”. Jokes aside, the area of North Carolina where the Howard’s and Brotherton’s lived is on the outside edge of Appalachia.

Why should Endogamy be something we need to look into carefully and closely? Simply put, it skews the numbers. If cousins marry, then the DNA mix is a mix from one family rather than two. So there is a double infusion of Genes.

My Map of Betty Jean’s Family

It started with one single 8 1/2 x 11 sheet of paper. In the middle of that first sheet of the paper I wrote a list of Betty Jean’s top matches, the 12 Jane had in her tree and a few more. Then I started adding lines back. Creating pedigree charts from the DNA matches for each of the family lines identified by the DNA and Jane’s tree. As I went I added more blank sheets to fill in as I added family. At this point the Howard’s and Brotherton’s were extending to the right, radially from the DNA matches circle in the middle. I added papers to the map so that it was 3 sheets long and 3 sheets wide. Thankfully Jane’s tree was now easy to see, even with all the complicated connections within the Howard and Brotherton families.

For each person added to my map, I added or connected them to WikiTree to create Betty Jeans birth/mirror tree. It was a great help having WikiTree’s relationship tools at the ready to help me define how these people might be connected to Betty Jean. It also helped me when trying to decipher Jane’s voluminous emails on family connections.