Wednesday 28 December 2022

The Swat Protohistoric samples

In South Asian genetics, we suffer from a paucity of ancient DNA to study. We have about 100 samples from Pakistan, Gandhara to be precise. These samples range from the Late-Bronze Age to the Medieval Historic period. Then, we have about two dozen samples from Roopkund Lake that are generally from antiquity but going up to the middle ages. This is all we have to work with. So let's analyze the profile of the Swat Protohistoric samples first.

One thing I would like to discuss before we begin is the presence of BMAC/Oxus/Turanian Bronze Age DNA in these Swat samples. Many people take G25 models too seriously and believe there is upwards of 30-45% Oxus ancestry in the Gandhara samples. I think this is entirely wrong, as formal tools such as qpAdm reject BMAC decisively or give very little amounts of it (2-5%)

To illustrate my point, take a look at a G25 profile of the samples with BMAC + Sintashta.


30% BMAC in Loebanr_IA and 25% BMAC in Katelai_IA! Keep this ridiculously high number in mind. Now, let us see what qpAdm says. 

Katelai_IA (click for the file run)

  1. weights
  2. # A tibble: 3 × 5
  3. target left weight se z
  4. <chr> <chr> <dbl> <dbl> <dbl>
  5. 1 Pakistan_Katelai_IA Russia_MLBA_Sintashta 0.163 0.0197 8.29
  6. 2 Pakistan_Katelai_IA Indus_Gonur 0.816 0.0395 20.7
  7. 3 Pakistan_Katelai_IA Uzbekistan_Dzharkutan_BA_1 0.0211 0.0459 0.459

2.1% BMAC/Oxus! down from the ridiculous 25% given before. 

Now, let's look at  Loebanr_IA

  1. $weights
  2. # A tibble: 3 × 5
  3. target left weight se z
  4. <chr> <chr> <dbl> <dbl> <dbl>
  5. 1 Pakistan_Loebanr_IA Russia_MLBA_Sintashta 0.176 0.0194 9.03
  6. 2 Pakistan_Loebanr_IA Indus_Gonur 0.832 0.0404 20.6
  7. 3 Pakistan_Loebanr_IA Uzbekistan_Dzharkutan_BA_1 -0.00746 0.0463 -0.161

Pure BMAC is literally rejected as a source. Loebanr_IA gets modeled purely as Indus_Gonur + Sintashta.  This is doubt enough to make us deeply suspicious of G25 runs for South Asians that use pure BMAC as a source. What's likely happening here is an overfit + affinity to BMAC based on higher proportions of ANF/CHG ancestry that are missing from the Indus Valley samples we currently have. Increased sampling of the IVC might fix this. The only way Indians get any BMAC is indirectly, via Steppe sources mixed with BMAC like Dashti Kozy or later historic ones like Kangju or the Iron Age Yaz sample TKM_IA (Takhirbai_IA). This indirect BMAC ancestry peaks in North-Western populations at perhaps 15%, nothing more. G25 however is not able to differentiate IVC from BMAC at the moment, so all calculators using BMAC are not very useful at the moment.

This is indeed the conclusion that Narasimhan et al 2019 came to also. 

Therefore, here is a model of the Swat Protohistoric samples without BMAC/Oxus as a source.


Kumsay here serves as proxy of the kind of Central Asian ancestry the Andronovo Pastoralists might have picked up on there way to the subcontinent. Using the same model, here is what we get for modern North-West Indian and some Pakistani populations.


Modern samples have more Steppe than most of the ancient ones, but the devil lies in the detail and in population structure. The modern samples are fairly homogeneous (barring the Punjabi_Lahore set which is a mix of all kinds of Pakistani castes). Are the ancient sample sets also homogeneous? Let's see. 

Indeed, what we find is that the Iron Age samples are not homogeneous, which makes sense. The modern samples are from stratified and endogamous Indian caste groups that have avoided intermarrying each other for millenia, the ancient ones are a graveyard dump of all sorts of people. We see that some of the Loebanr_IA samples are as low as 5% Steppe (I12981, I12134) while some of the Katelai_IA are as low as 2% Steppe (I12446, I12470, I12460) while some are as high as 28% Steppe (I12141). The Udegram_IA, Saidu_Sharif_H and Butkara_IA ones are far more homogeneous. Saidu Sharif_H has one outlier that is like 33-35% Steppe and one outlier that is like <5% Steppe.

Here are all the runs posted. 






For Katelai_IA, will split it in 2 screenshots.



Similarly for Loebanr_IA due to large number of samples.





Even after accounting for most of the outliers, we can see that in general, the Iron/Bronze Age and Historic samples are about 5-8% lower Steppe_MLBA than modern North Westerners. Perhaps this points to different waves of migration and a different wave of Indo-Aryans giving rise to modern North Westerners (with differences amongst themselves too). This can be confirmed by Y-DNA to a regard, where tribes such as Khatris get 65% R1a-Z93 with most of it being the Indian L-657 while the Swat Protohistoric samples are maxxed out in J2, E1b, L-M20 with barely any R-Z93s. Hence, clearly different paternal lineages gave rise to lot of these tribes. Those lineages must've different in their exact autosomal profile too, even if it was very similar. For more on the Swat haplogroups, see this post of mine

Saturday 24 December 2022

Genetics of the Kurmi Tiller caste

The Kurmi are a Shudra caste of non-elite tillers and reside mostly in the North Indian states of Uttar Pradesh and Bihar. Something cool I found while studying about this caste was that the first Prime Minister of Mauritius was a Kurmi. The name Kurmi probably comes from a Sanskrit word for tiller.

We have one high quality Kurmi sample (500k snps) and this gives us an opportunity to take a look at their ancestry profile closely. There is a lot of discussion on the ancestry of Brahmins, Kshatriyas, Vaishyas from North India, but not much around North Indian Shudras. So, this will be helpful. 

Using ADMIXTOOLS 2, this is what I get for the lone Kurmi sample (evo_10)


Uttar Pradesh Kurmi

Russia_MLBA_Sintashta: 15.7 ± 2.28%
Indus_Periphery_Gonur: 46.6 ± 3.76%
Onge (AASI proxy): 37.7
 ± 2.41%

p-value: 0.349

This shows us that Shudra castes such as Kurmis have about 15-16% Sintashta-like ancestry, which is similar to how much Sintashta South Indian Brahmins have but have much higher Onge/AASI like ancestry than those South Indian Brahmins. This confirms what Razib Khan had said before, that South India saw higher IVC wave while in North India it was Aryans mixing with much purer AASI which is what gives the unique profile we see here of relatively high steppe plus high onge.

File of the run

https://pastebin.com/Vcw1KMUV






Saturday 3 December 2022

Long-term genetic stability and a high-altitude East Asian origin for the peoples of the high valleys of the Himalayan arc

Long-term genetic stability and a high-altitude East Asian origin for the peoples of the high valleys of the Himalayan arc


Abstract

The high-altitude transverse valleys [>3,000 m above sea level (masl)] of the Himalayan arc from Arunachal Pradesh to Ladahk were among the last habitable places permanently colonized by prehistoric humans due to the challenges of resource scarcity, cold stress, and hypoxia. The modern populations of these valleys, who share cultural and linguistic affinities with peoples found today on the Tibetan plateau, are commonly assumed to be the descendants of the earliest inhabitants of the Himalayan arc. However, this assumption has been challenged by archaeological and osteological evidence suggesting that these valleys may have been originally populated from areas other than the Tibetan plateau, including those at low elevation. To investigate the peopling and early population history of this dynamic high-altitude contact zone, we sequenced the genomes (0.04×–7.25×, mean 2.16×) and mitochondrial genomes (20.8×–1,311.0×, mean 482.1×) of eight individuals dating to three periods with distinct material culture in the Annapurna Conservation Area (ACA) of Nepal, spanning 3,150–1,250 y before present (yBP). We demonstrate that the region is characterized by long-term stability of the population genetic make-up despite marked changes in material culture. The ancient genomes, uniparental haplotypes, and high-altitude adaptive alleles suggest a high-altitude East Asian origin for prehistoric Himalayan populations.


Interestingly, all reads from our ACA individuals match the derived allele for the nonsynonymous EGLN1 SNP rs186996510 (SI Appendix, Table S2), including the oldest Chokhopani sample (C1). This derived allele, c.12G > C (p.Asp4Glu), is reported in high frequency in Tibetans (0.64–0.85) (2237), but is rare in low-altitude East Asians (0.03 in 1KG phase 3 East Asians) and virtually absent outside East Asia. Functional studies have implicated this allele as playing a role in oxygen homeostasis under hypoxic conditions (3739). In contrast, reads supporting derived alleles at the EPAS1 SNPs were found in two of the three later Samdzong individuals (S35 and S41), but not in the earlier Chokhopani (C1) or Mebrak (M63) individuals. 


 

Maratha & Chitpavans

Marathas seem to have a lot of variation in their Andronovo and AASI ranges. Perhaps this is a confirmation of the fact the modern Maratha c...