History & Archaeology

The Script Nobody Can Read: Why the World’s Most Intriguing Ancient Mystery Remains Unsolved After so many Years

Over 4,000 inscriptions. More than a century of scholarship. AI, machine learning, a million-dollar prize, and an international conference with the Prime Minister and Home Minister in attendance. The Harappan script still has not been deciphered. Here is why — and why the question matters so much more than a history puzzle should.

Sometime around 2600 BCE, in a city of 40,000 people with flush toilets and standardised brick sizes, someone pressed a small soapstone seal into wet clay and made a mark. The seal was square, about 2.5 centimetres across, and it bore symbols — a row of signs above the image of a humped bull or a unicorn or a tiger seated before what might be an altar. The mark dried. The person who made it lived, worked, traded, and died. Their city was buried. Their civilisation faded. And the symbols on that seal, reproduced across thousands of similar objects from a Bronze Age urban culture that stretched across modern Pakistan and northwest India, waited.

They are still waiting.

The Harappan or Indus script — the writing system of the Indus Valley Civilisation, which thrived between roughly 3300 and 1300 BCE — is one of the last great undeciphered writing systems in the world. More than a century after Sir John Marshall announced the discovery of the civilisation to the world in 1924, and nearly 4,000 years after the last Harappan city fell silent, the symbols on those seals have not yielded their meaning to anyone.

In September 2025, more than 1,100 scholars, researchers, engineers, computer scientists, linguists, and students gathered in New Delhi for an international conference organised by the Union Ministry of Culture and the Archaeological Survey of India, dedicated entirely to the question of the script. Prime Minister Narendra Modi and Home Minister Amit Shah attended — signalling that this is not merely an academic puzzle. The political significance of this undeciphered system of writing runs as deep as its cultural weight.

It has still not been cracked.

First: What Makes a Script Decipherable?

To understand why the Harappan script has resisted every attempt at decipherment, it helps to understand what made other ancient scripts eventually yield. Deciphering an unknown script is not one problem. It is a sequence of at least five problems that must be solved in order, with each step depending on the previous one.

The Five Steps to Decipherment

Step 1. Establish that the symbols actually constitute a writing system — not decorative marks or an inventory system.

Step 2. Identify and separate individual signs from the symbol stream — work out where one sign ends and another begins.

Step 3. Reduce the full set of observed symbols to a minimal core inventory by identifying allographs — variant forms of the same sign, the way printed “a” and cursive “a” are the same letter.

Step 4. Assign phonetic or semantic values to each sign.

Step 5. Match those values to a known or reconstructable language.

Source: Fabio Tamburini, ‘Decipherment of Lost Ancient Scripts as Combinatorial Optimisation,’ 2023

For the Harappan script, scholars are still arguing about Step 1.

The Three Reasons It Is So Hard

1. There Is No Rosetta Stone

The single most powerful tool in the history of decipherment is a multilingual inscription — the same text written in a known and an unknown script, side by side. The Rosetta Stone gave scholars a Greek text alongside Egyptian hieroglyphics and demotic script; Jean-François Champollion used it to crack hieroglyphics in 1822. The Behistun Inscription in Iran gave scholars a trilingual cuneiform text that unlocked ancient Persian, Elamite, and Babylonian writing in the early 19th century.

The Harappan civilisation had robust trade links with Mesopotamia — Indus seals and goods have been found in ancient Mesopotamian sites, and Mesopotamian records mention a land called “Meluhha” which most scholars identify as the Indus Valley. But despite decades of excavation, not a single bilingual inscription connecting the Harappan script to any known writing system has been found. The civilisation left no Rosetta Stone. Without one, every attempt at decipherment is essentially guesswork elevated by method.

2. An Unknown Script Writing an Unknown Language

Scholar Andrew Robinson, in his influential book Lost Languages (2008), divides undeciphered scripts into three categories: an unknown script writing a known language; a known script writing an unknown language; and an unknown script writing an unknown language. The Harappan script falls into the third — the hardest category, with the fewest reference points.

What Is Linguistic Decipherment? (Simply Explained)

Why “Reading” a Script Is Not the Same as “Understanding” It

Decipherment has two separate components that are often confused. The first is reading — working out the sound or phonetic value that each symbol represents. The second is understanding — working out what the language those sounds constitute actually means. A script can be read without being understood: scholars can accurately pronounce Linear B symbols (the script of Mycenaean Greek), but only because Michael Ventris recognised in 1952 that the underlying language was an archaic form of Greek — a language he already knew. If the Harappan language turns out to be a form of proto-Dravidian, or Sanskrit, or something entirely unrelated to any surviving language, the same symbols that can be read phonetically might remain semantically opaque for decades more.

3. The Inscriptions Are Extremely Short

Of the approximately 3,500 to 4,000 seals and inscribed objects that have been identified, the average inscription contains just five signs. The longest known Harappan inscription has 26 characters. This is not a lot to work with. Compare this to ancient Egyptian hieroglyphics, which covered entire temple walls, or Mesopotamian cuneiform, which filled clay tablets with economic records, astronomical observations, and literature including the Epic of Gilgamesh. The Harappan inscriptions — brief, contextually ambiguous, and mostly appearing on objects of uncertain purpose — provide the bare minimum of material for analytical work. Its symbols number over 400 but lack a bilingual key, making decipherment difficult. The script is brief and appears on commercial and ritual objects.

What We Know: The Key Facts About the Script

The Harappan Script — What Scholars Agree On

  • Civilisation dates: approximately 3300–1300 BCE, at its urban peak around 2600–1900 BCE
  • Geographic spread: over 800,000 sq km across modern Pakistan and northwest India — the world’s largest Bronze Age urban culture by area
  • Number of inscribed objects found: approximately 3,500–4,000 seals plus pottery, tablets, and other artefacts
  • Number of distinct signs: estimates range from 400 to 425 (Asko Parpola identified 425; S.R. Rao identified 62)
  • Average inscription length: 5 signs
  • Longest known inscription: 26 signs
  • Writing direction: most likely right to left
  • No bilingual inscription has ever been found
  • The underlying language remains unknown

The Competing Theories:

The question of what language underlies the Harappan script is not merely academic. It is entangled with some of the most contested questions in South Asian history: where Sanskrit came from, whether the Aryan migration theory is correct, and who can claim the deepest roots in the Indian subcontinent. As one conference document put it, decipherment debates often reflect present-day cultural politics as much as ancient realities.

Theory 1: Sanskrit / Vedic [S.R. Rao]

The earliest notable Indian attempt was by archaeologist S.R. Rao, who in 1982 postulated that the script contained 62 signs and linked the Indus language to Sanskrit and the Vedic civilisation. As Andrew Robinson wrote, Rao seemed “determined to prove that the Indus language was the ancestor of Sanskrit, the root language of most modern languages of North India, and that Sanskrit was therefore not the product of Indo-Aryan invasions from the west via Central Asia but was instead the expression of indigenous Indian genius.”

At the September 2025 conference, some researchers went further, claiming the script contained Rig Vedic mantras and identifying references to the Puranas — texts that historians note were composed over a thousand years after the Harappan civilisation ended.

If Sanskrit were proven to be the underlying language, it would support the argument that the Vedic and Harappan civilisations were continuous — a claim with enormous implications for the Aryan migration debate and for the political case that Vedic culture is entirely indigenous to the subcontinent.

Theory 2: Proto-Dravidian [Asko Parpola]

The most developed and widely cited scholarly hypothesis is that the underlying language is a form of proto-Dravidian — an ancestor of the Dravidian language family that today includes Tamil, Telugu, Kannada, and Malayalam. Its most prominent proponent is Finnish Indologist Asko Parpola, who has spent decades on the script and identified 425 distinct signs.

What Is the “Rebus Principle”?

How Pictograms Can Represent Sound, Not Meaning

The rebus principle is a writing technique in which a pictogram represents a word that sounds like the depicted object — not the object itself. The clearest modern example is the way a bee and a leaf might together represent “belief” — not because anyone is writing about bees and leaves, but because the sounds match. Ancient writing systems widely used this technique to extend a limited set of pictures into a system that could represent abstract words and grammatical elements.

Parpola used this principle to interpret the fish sign — one of the most common symbols on Indus seals. He argued it is unlikely to represent actual fish. In Dravidian languages, the word for fish (min or meen) is a homophone of the word for star. So the fish sign, in Parpola’s reading, represents “star” — and building on this, he claimed to have found the Old Tamil names of all planets written into the Indus inscriptions. This interpretation, while ingenious, requires acceptance of the Dravidian language hypothesis as a prior — which is exactly what remains disputed.

Support for the Dravidian hypothesis comes from an unexpected quarter: Brahui, a Dravidian language spoken today by roughly three million people in Balochistan, Pakistan — geographically at the heart of the Indus Valley Civilisation’s territory.

The existence of a Dravidian language in this region, isolated from the main Dravidian-speaking areas of south India, suggests that Dravidian languages may once have been far more widespread across the subcontinent. India’s leading Indus script researcher, the late Iravatham Mahadevan, supported the Dravidian hypothesis, as have several Western scholars.

Theory 3: Tribal and Austro-Asiatic Languages [Prakash N. Salame]

Scholars such as Prakash N. Salame claim up to 90 percent decipherment through Gondi, a proto-Dravidian language, while Prabhunath Hembrom explores Santali connections. Both proposals face scepticism due to methodological gaps. Others have linked the script to Ho, a language of the Jharkhand region. These claims, often made with passionate certainty, have not persuaded the mainstream of the field.

Theory 4: Not a Script At All [Steve Farmer]

The most disruptive hypothesis came in a 2004 paper by historian Steve Farmer, computer linguist Richard Sproat, and Indologist Michael Witzel. They argued that the Harappan symbols are not a script in any linguistic sense.

Their evidence: the inscriptions are too short to encode a language, there is too much repetition of the same short sequences, and the signs may function more like religious or political emblems — heraldic or ritual markers rather than phonetic notation.

Parpola and others criticised the paper sharply at the time. But its conclusions have since found additional support. Linguist Peggy Mohan, author of Wanderers, Kings, Merchants: The Story of India Through Its Languages, told that the signs resemble a hallmarking system — like the personalised marks that dhobis in India still use today to identify their customers’ laundry.

“Even today dhobis in India have their own signs which are useful for them but they are not what you would call language,” she said.

A software engineer named Bahata Mukhopadhyay has suggested the script encoded rules for taxation and commerce, rather than spoken language — aligning with the Farmer-Sproat-Witzel view.

Enter the Machines: AI and the Million-Dollar Prize

In early 2025, Tamil Nadu Chief Minister M.K. Stalin announced a prize of one million dollars for anyone who could credibly decipher the Harappan script. The announcement was partly political — the Dravidian hypothesis, if confirmed, would provide enormous cultural validation for the Tamil-speaking south — but it also reflected the genuine global excitement around the possibility that AI-powered analysis might finally break the deadlock.

A March 2025 study using a hybrid CNN-Transformer model explored visual patterns in Harappan inscriptions and found symbol frequency and co-occurrence that align with known scripts, but the researchers concluded that further linguistic context is needed.

Computer scientist Rajesh PN Rao at the University of Washington has used statistical analysis to argue that the script shows the conditional entropy patterns characteristic of linguistic systems — consistent with it being a script — rather than the patterns of non-linguistic symbol systems. His work counters the Farmer-Sproat-Witzel hypothesis, though it does not identify the language.

India has also turned to young entrepreneurs in AI and machine learning, with a pan-India competition identifying 40 participants and 10 researchers to contribute to decipherment efforts — part of the broader “Gyan Bharatam Mission” announced at the September 2025 conference to preserve and study manuscript heritage.

The difficulty is that AI tools are powerful at pattern recognition but are still dependent on the same fundamental limitation: without an anchor — a known language, a bilingual text, a confirmed phonetic value for even one sign — there is no way to validate any reading.

A machine that finds statistical patterns in the Harappan corpus can tell you which signs cluster together, which sequences are most common, and how the entropy of the sign distribution compares to known languages. It cannot tell you what the signs mean.

Why It Matters Beyond History

The stakes of decipherment go far beyond academic curiosity. If the Harappan script is ever genuinely cracked, it would answer questions that lie at the heart of India’s self-understanding as a civilisation.

Was the language Sanskrit? Then the Vedic and Harappan traditions were not separate civilisations but continuous ones, and Sanskrit is indigenous to the subcontinent in a way that the Indo-Aryan migration theory denies.

Was it proto-Dravidian? Then the Harappan people were the ancestors of south India’s linguistic communities, and the narrative of Dravidian culture being peripheral to “mainstream” Indian civilisation is historically backwards.

Was it something else entirely — a language unrelated to anything that survived — or was it not a language at all, but a system of marks? Then the Harappan civilisation, the world’s largest Bronze Age urban culture, is in some sense permanently opaque: we can see its cities and measure its drainage systems, but we cannot read its mind.

That is the condition we are currently in. A century of scholarship, four thousand seals, 400-odd symbols, AI models, international conferences attended by prime ministers, a million-dollar prize, and still — silence.

The seal is still waiting.


Source: Indian Express

Share is Caring, Choose Your Platform!

Receive Daily Updates

Stay updated with current events, tests, material and UPSC related news

Recent Posts