I decided to run qpAdmin on a large number of the South Asian Genotype Project members. The codes should be self-evident for the individuals. The Indus Periphery samples are from the Reich dataset. The steppe is all Sintashta samples from the recent publication (I removed outliers). The Andamanese hunter-gatherers are from the Andamans.
Some of the populations are not good fits on the India cline. Adding Dai as East Asian improves the fit for the Bengali Kayastha. But it messes it up for most of the others.
Please note that these are individuals. There is going to be variance within populations.
Individuals | Indus_Periphery | Steppe | AHG |
AP_vellala_1 | 0.583 | 0.065 | 0.352 |
Beng_Brah_1 | 0.574 | 0.231 | 0.196 |
Beng_Kayastha_1 | 0.438 | 0.12 | 0.442 |
Bihar_Babhan_1 | 0.442 | 0.352 | 0.206 |
Bihar_Sayyid_1 | 0.47 | 0.28 | 0.249 |
Chhatt_satnami | 0.453 | 0.178 | 0.369 |
Guj_Bohra_Patel_1 | 0.767 | 0.214 | 0.018 |
Guj_lohanna_1 | 0.776 | 0.199 | 0.024 |
Guj_Patel | 0.71 | 0.105 | 0.185 |
Guj_Tapodhan_Brah | 0.755 | 0.112 | 0.133 |
Guj_Vania_1 | 0.534 | 0.227 | 0.239 |
Guju_Brah_1 | 0.569 | 0.293 | 0.138 |
Guju_Jain_Brah_1 | 0.545 | 0.291 | 0.164 |
Guju_Solanki | 0.651 | 0.07 | 0.279 |
High_caste_nair_1 | 0.784 | 0.052 | 0.164 |
Indian_GreatAndaman_100BP | 0.1 | 0.007 | 0.893 |
Jammu_Dogra_Brah_1 | 0.597 | 0.18 | 0.223 |
Kann_AP_Brah_1 | 0.612 | 0.173 | 0.215 |
Kann_Brah_1 | 0.506 | 0.113 | 0.381 |
Kann_Kodava_1 | 0.742 | 0.03 | 0.228 |
Kash_Suniareh_1 | 0.595 | 0.314 | 0.091 |
Kashi_butt_1 | 0.571 | 0.275 | 0.154 |
Kashi_syed_1 | 0.541 | 0.262 | 0.197 |
Ker_Knanaya_1 | 0.694 | 0.152 | 0.153 |
Ker_Nasrani_1 | 0.582 | 0.082 | 0.336 |
Ker_nasrani_2 | 0.648 | 0.075 | 0.278 |
Ker_Tam_Brah_1 | 0.714 | 0.126 | 0.16 |
Ker_Varma_1 | 0.741 | 0.098 | 0.161 |
Kurumba | 0.352 | 0.036 | 0.612 |
Maha_Kayastha_1 | 0.48 | 0.25 | 0.27 |
Marathi_Brah_1 | 0.612 | 0.149 | 0.239 |
Marathi_SKP_1 | 0.458 | 0.245 | 0.297 |
Marathi_Urdu_Mus_1 | 0.459 | 0.247 | 0.295 |
Marwari_1 | 0.566 | 0.208 | 0.226 |
Nepali_Brah_1 | 0.517 | 0.252 | 0.23 |
Padmashali_1 | 0.541 | 0.099 | 0.36 |
Pak_Arora_1 | 0.68 | 0.287 | 0.033 |
Papuan | 0.183 | 0.93 | |
Pathan | 0.67 | 0.254 | 0.076 |
Pathan_Yousafzai_1 | 0.725 | 0.277 | -0.002 |
Punjab_Airan_1 | 0.878 | 0.143 | -0.021 |
Punjab_Jatt_2 | 0.586 | 0.338 | 0.076 |
Punjab_Jatt_6 | 0.64 | 0.3 | 0.06 |
Punjab_Jatt_7 | 0.572 | 0.279 | 0.148 |
Punjab_Ramgarhia_1 | 0.657 | 0.202 | 0.142 |
Punjab_Syed_1 | 0.669 | 0.249 | 0.082 |
Punjabi_Jatt_5 | 0.674 | 0.275 | 0.051 |
Rajas_Rajput_1 | 0.635 | 0.249 | 0.116 |
Rajas_Syed_1 | 0.597 | 0.181 | 0.222 |
Saraswat_Brah_1 | 0.547 | 0.213 | 0.239 |
Sindhi_lohanna_1 | 0.719 | 0.3 | -0.019 |
Tam_gounder_1 | 0.606 | 0.056 | 0.338 |
Tam_Iyer_1 | 0.691 | 0.159 | 0.15 |
Tam_Iyer_3 | 0.698 | 0.177 | 0.124 |
Tam_Mudaliar_1 | 0.565 | 0.028 | 0.407 |
Tam_Naidu_1 | 0.624 | 0.063 | 0.312 |
Tel_Niyogi_Brah_1 | 0.528 | 0.192 | 0.28 |
Tel_Reddy_1 | 0.591 | 0.112 | 0.296 |
Telegu_Raju_1 | 0.683 | 0.004 | 0.313 |
UP_Awadh_Mus_1 | 0.765 | 0.094 | 0.14 |
UP_Kayastha | 0.334 | 0.236 | 0.43 |
UP_mohajjir_1 | 0.519 | 0.243 | 0.238 |
UP_mohajjir_2 | 0.717 | 0.24 | 0.043 |
UP_mohajjir_3 | 0.423 | 0.368 | 0.209 |
UP_Mus_Weaver_1 | 0.585 | 0.17 | 0.244 |
W_Beng_Brah_1 | 0.511 | 0.224 | 0.265 |
W_Beng_Kayastha_1 | 0.437 | 0.164 | 0.399 |
W_E_Beng_Brah_1 | 0.609 | 0.171 | 0.22 |
Great! Though the Punjab Arain results seem incorrect with the negative AHG and superhigh InPe.
Could you run your Bengali ones separately with Dai?
Non-positive semi-definite matrix of raw data causing negative eigenvalues.
Old paper by Rebonato & Jaeckel detailing the algorithm to fix this. The applications mentioned are in finance but methodology is general. And ensures this never happens.
https://www.researchgate.net/publication/2487818_The_Most_General_Methodology_to_Create_a_Valid_Correlation_Matrix_for_Risk_Management_and_Option_Pricing_Purposes
did you here of Tao figuring out a simpler way to compute eigenvalues? As someone in medicine,I haven’t used them since undergrad linear algebra and diff eq, but the news sounded cool
Yes, actually Tao contributed to an existing (physics!) paper. That news was the best thing I read in 2019.
Though the result was a full specification of eigenvectors from eigenvalues alone (without needing to know the whole matrix).
Evidently, the result in the Tao paper already known in the 60’s, as acknowledged by the lead author on it:
https://twitter.com/jazzwhiz/status/1195109643850260482?s=20
Could it be something much simpler? Looks to me like only the first two coefficients above were obtained by projecting onto the IP and S components, and the last one obtained by a sum rule that they should total to one… so statistical uncertainties in the first two could conspire in some cases to cause the AHG coefficient to come negative. Hence, a lower bound on the statistical error in the numbers above can be estimated from the worst negative offender in the AHG column, which is about 2%. TLDR; I wouldn’t put too much weight the numbers above past the second decimal place, and all the negative entries in AHG column are to be read as 0.
Razib — do you know if the P in the Marathi SKP stands for Pathare or Panchkalshi?
Great! Though the Punjab Arain results seem incorrect with the negative AHG and superhigh InPe.
yeah. some of the pakistan groups are getting off-cline.
yeah i will rerun the bengalis…
Wtf AHG is literally the African looking people now living in the Andamans…now i understand why do some biracial people look so south asian.
BTW which modern population represents Iran HG? Modern Iranians minus the steppe?
And what do negative percentages indicate?
All mixed race people produce some ambiguous looking offsprings. I’ve seen some Latinas who can easily pass as native Bangladeshi, also some Horner African and Sudanese women can have pseudo south Asian look. Many gulf Arab also look pseudo south Asian. Look at Bangladesh vs Yemen under-16 football match:
https://youtu.be/01XfPWIbld4
Yemenis generally have some significant sub-saharan african admixture i think.
Iranians don’t score higher steppe than most South Asian IIRC. Modern Balochis score highest Iran_HG?
Highest amount of Iran related ancestry is found in Balochis AFAIK. Within Iran it peaks among Balochis, Bandaris and Mazandaranis.
can you also run the guju jain vania one?
asking for a friend 😉
wasn’t the indus periphery in the model like 1/4th AHG, given that was the average AHG of the three individuals used to model it?
Do you have access to any eastern Jat and Ror samples as well?
The Chatt_Satnami individual (a Chamar-like group) scores lots of steppe just like the Chamar. Its odd that this individual had 0% Lithuanian in SAGP, and non brahmin Bengalis consistently scoring some Lithuanian get lesser steppe. Perhaps the inflation of extra steppe % in chamar-like groups is due to artifactual reasons too?
Is the InPe component 1/4 AHG? Could be the reason why cartain groups/Individuals score high InPe and low AHG.
As for Bengalis, i guess the InPe is capturing their Iran_HG mostly cuz theyre getting high AHG(which also capturing their Dai in this case).
Conclusion – We are all Indus Valley.
Thanks for coming out everyone else.
Hi Razib,
I sent you my data last summer (around august, I believe). Any possibility you will be able to add it to your analysis?
yes. i’ll add all the newbies at some point.
lmfao your results for arains and jats is going to make pakdefense forum go absolutely gaga for their long lost brother from the east: Sher Rasgoolah Khan
Geezers in that pakdefence forum know nothing about genetic.
Ah Pakdefense, those hallowed halls. I was summarily ejected from that forum some time ago for being less than charitable towards certain Islamist views. And by Islamist, I mean actual Islamist, not the term used by right-wingers for any Muslim they disagree with.
Intrigued by the result on Iyers, as this is the first such breakdown I’ve seen. The curious thing is the combination of lower AHG admixture relative to most groups including other South Indian Brahmin groups as well as lower Steppe admixture relative to North Indian Brahmin groups. Also, Iyers have an internal structure, with Vadama Iyers maintaining an oral history of migration post 1000 AD from Gujarat & UP, other Iyer groups having regular intermarriage with the local Deccan Nobility, and some supposedly drawn from local populations. While these internal divisions were documented by anthropologists, with different subsets refusing to intermarry and even break bread together, anthropologists have a tendency to present relatively recent political accommodations as indicative of some primeval structure. So I’ve always been interested to see if each group actually had that history show in the genetics. One complicating factor is that intermarriage between different groups of Iyers & Iyengars increased over the past 3 generations, especially among the cosmopolitan types likely to show up in small samples. Still, interesting to see the results and to speculate.
>lower AHG
There are different kinds of indus periphery and they have different amounts of AHG so IDK how much of the different kinds of indus periphery they have. The InPe composition in Kalash would have a different ratio of Shahr BA1 vs Shahr BA2 than the InPe in Iyers.
Furthermore I suspect that the AHG in Rakhigarhi is somehow underestimated in Shinde’s paper, since on Narasimhan’s PCA Rakhigarhi is closer to AASI type sources than what one would expect with a mere 27% AASI input.
I would expect the AASI ancestry in the models (when completely separated from Iran ancestry) to go up in the IVC samples, InPe samples and in modern south Asians as well when a proper group AASI sample is published.
yeah steppe and iran HG related over and AASI under
Shouldn’t Tam Iyer 3 be Tam Iyer 2?
Hey, Razib
you think you can model me too? I took a dna-test too and i would like to see how i turn out as well.
What outgroups did you use? I hope it included geoksyur_en.
Also hope that allsnps was set to “YES”.
what p value cutoff was used?
Looking through the project members spreadsheet, it’s odd how West_East_Bengal_Brahmin_1 scores more Lithuanian than Bihari Babhan and the other Bengal Brahmins but scores much lower steppe on your qpAdm. Is this a reflection of a failure in the old model or some mislabeling on the qpAdm
Lithuanian??? They did not exist at the time of Aryans invasion/migration. When was the contact between them and people in today’s India?
Not to single you out Kann_Kodava_1, but that is fascinating how high your Indus Periphery versus how low your steppe ancestry and and your AASI is on the lower end as well. Kodavas in Kannada were always thought of as different from surrounding populations and it turns out they are the least steppe influence and the most Indus influenced, if your sample is represenative of other Kodavas. Razib, I wonder if they would make a good model of the IVC inhabitants, or at least the first IVC immigrants into south india?
Might also be a good opportunity to retire the mythos around them being related to greeks, or some other exogenous element. There may be an east to west AASI cline in peninsular india, with certain isolated populations in the western ghats representing the least admixture.