WHETHER "data" is singular or plural is one of those hardy perennials of usage debate in which both sides have impossibly entrenched positions. Or so I had thought, but the Wall Street Journal has, as of today, taken an unusually fence-sitting position:
Most style guides and dictionaries have come to accept the use of the noun data with either singular or plural verbs, and we hereby join the majority.
As usage has evolved from the word’s origin as the Latin plural of datum, singular verbs now are often used to refer to collections of information: Little data is available to support the conclusions.
Otherwise, generally continue to use the plural: Data are still being collected.
(As a singular/plural test, try to substitute statistics for data: It doesn’t work in the first case — little statistics is available — so the singular is fails to pass muster. The substitution does work in the second case — statistics are still being collected – so the plural are passes muster.)
I admire the attempt to satisfy both tradition and change, but it does leave some leeway that I can imagine many writers having a hard time handling. People crave hard and fast rules: they don't have time to make judgments all the time, like the suggested route of substituting "statistics". (This is the first time I've heard of this remedy, for what it's worth.)
But hard-and-fast doesn't always work, as I noted in my last submission on "data". We don't use the foreign morphology of every word brought from a foreign language. But we do sometimes. Since that last post, I have found this excellent one supplying some new counterarguments against always-plural "data". Among them: we certainly don't use "agenda" and "stamina" in the plural, though they have come to us the same way "data" has. (If your boss ever does say "moving on to the next agendum", let us know.) The "media" question remains mixed: some have it singular, others have it plural.
We have a strong urge to just have language behave, but regular readers of this column know that, as the original Johnson knew, it just won't. He wrote that "to enchain syllables, and to lash the wind, are equally the undertakings of pride." Less well known, but perhaps more to the point, he pointed to the unruliness of language as the sign of a healthy culture constantly enriching itself:
The language most likely to continue long without alteration, would be that of a nation raised a little, and but a little, above barbarity, secluded from strangers, and totally employed in procuring the conveniencies of life; wither without books, or, like some of the Mahometan countries, with very few: men thus busied and unlearned, having only such words as common use requires, would perhaps long continue to express the same notions by the same signs, But no such constancy can be expected in a people polished by arts, and classed by subordination, where one part of the community is sustained and accommodated by the labour of the other. Those who have much leisure to think, will always be enlarging the stock of ideas, and every increase of knowledge, whether real or fancied, will produce new words, or combinations of words. When the mind is unchained from necessity, it will range after convenience; when it is left at large in the fields of speculation, it will shift opinions; as any custom is disused, the words that expressed it must perish with it; as any opinion grows popular, it will innovate speech in the same proportion as it alters practice.
Aside from the casual slur of the "Mahometan countries", this remains realistic good sense, as does the rest of the essay, much recommended.



Readers' comments
The Economist welcomes your views. Please stay on topic and be respectful of other readers. Review our comments policy.
Sort:
It would seem that TE is wavering in its pluralization of "data". 9th Feb 2013, in "The Voucher Business" I find "... the data itself." (singular - hooray) but then it slips back in the next paragraph with "... data are ..."
Good prose should be transparent. i.e. the meaning should not be obscured by solecisms, unusual constructs etc. When "data" is used as a plural my mind trips over it and I say to myself "dear old oxbridge pedants". I lose continuity.
A strong suggestion to the sub-editors; replace every use of "data" with either "information" (sng.) or "numbers" (pl.).
Oh, btw a Goolge search gives 150 million "data is"s, only 50 million "data are"s. I did the same search some years ago and the ratio was about 60/40. So the battle for data-as-a-plural is being lost.
Latin is dead, so let's just move on, can we? If you look at it conceptually, data is one of the most obvious examples of a mass noun. It has nothing in common with countable nouns. Does anyone really use the word datum? A piece of data or a data point sounds great to me.
British: My family are going on a holiday.
American: My family is going on a vacation.
None is available. None are available. I probably should have used a semi-colon instead of a period/full stop.
Your first example is a legitimate variation between British and American English. Your second example is just right vs. wrong.
Foreign plurals were adopted by scholars who were familiar with the languages the terms came from. Data, for example, goes back to about 1640, a time when European scholars used Latin. I suppose that some of the terms were frequent and familiar enough that they kept their irregular forms even after scholars began to write in English.
And why, first of all, that absurd use of the plural forms of the words of some (not all) neologisms? When it was adopted by English, datum should have got the "datums" plural, any other thing is, at the end, absolutely stupid. If you take "data", then you should only use it in a nominative use, but what about genitive, accusative, etc? You´d actually have to use declensions in English.
What is the opinion about forum/fora/forrums? Why alumni (plural of alumnus) and alumnae (from alumna) are not questionable?
Garner's Modern American Usage (Oxford, 2003) holds that "data ... in more or less formal context ... is preferably treated as a plural," but that "it's pedantic and prissy to say ... fora, ... auditoria, ... rostra."
The error is in attempting to find a rigorous rule to cover all situations. Language, especially English, doesn't work like that.
On the issue of usage, my blog posted on Friday, July 6, 2012, at clearwriting4u.com, discusses use of faulty idioms as another obstacle on the road to clear writing. AGRegardie.
This article is absolutely right... we should start saying "agendum".
Jonathan Owen got it just right. English speakers have reanalyzed data as a mass/non-count noun. So the debate about singular/plural is not going to help. Data has no singular or plural outside of specialist uses. In scientific circles you will read "these data" and maybe even "this datum," but how useful is a datum? Likewise, any useful amount of data would be an uncountable, undifferentiated mass.
The discussion following the article is lively..but, the new SMS/texting language that is evolving, will probably render all this discussion totally irrelevant. All languages,- and the words, phrases,grammer rules, syntax etc have a single point agendum, namely, to communicate and convey. The niceties and elegance of the process of communicating have lost their relevance today.
NAFAIC
ASDASIG
"Hacker: We only have one item on the agenda.
Sir Humphrey: Then it is agendum, Prime Minister.
Bernard: I don't think that the Prime Minister got to the second
desclension..."
'Yes, Prime Minister' BBC Televsion, 1986 to 88, possibly the best comedy series ever made for television.
I always think of data as a group or set and use the singular. This spreadsheet of data shows that ...; this graph of data indicates ...
Really? You could just omit "of data" in both phrases and do as well with fewer words.
You could also omit the word "spreadsheet" or "graph" and the use of data with the singular makes sense, in this case because one more or less understands that there is some unspoken category or set to which the singular verb relates: the data shows that; the data indicates...
As for missing words that are thoroughly understood by both speaker and listener, compare rhyming slang. Articulate speakers of English know why "rabbiting" means to talk excessively -- "pork" is unsaid but understood as the rhyme. So spreadsheet (or some other vessel) can be unsaid but understood in "the data is...."
An article in this week's issue contains this beauty: "there were little real data to use as a basis when submitting LIBOR".
Surely "there was little data" or, at a stretch, "there were few data".
The sentence in the article is correct as written. i'm pleased to see that The Economist is keeping its standards up.
Truly? If someone had written "There are little real persons who believe that data is always plural", then I would assume you were one of these dwarfish, fairy folk. If data is truly a countable plural noun, then applying "little" to means each datum is physically tiny.
Persons are countable, but "data" as used in the sentence in question is equivalent to a singular of indefinite amount, so an exception recognized by Fowler (2nd edition). It's a fine point and the precise example is not discussed in Fowler, but the meaning is perfectly clear and the form does not seem odd to anyone accustomed to reading scholarly articles (which this is). No one would think the writer meant that the individual datums were small; such an assertion is disingenuous.
English is as she is spoken. Whatever the rules, they have changed from yesterday and will change tomorrow. The correct answer is not to debate rationally about examples, but to take a poll. Even then, the alternative is not "wrong", just a minority opinion.
This is incorrect for at least three reasons:
1. Spoken English is usually informal. Formal language, especially when written, is held to higher standards.
2. Professionals maintain their language usage rules whether laymen understand and follow them or not, because they write for each other (and for laymen conversant in their professional language), not for the masses, and because they expect their writing to survive in archives and be read and understood long after being written.
3. Educated persons do not dumb down their language for the masses, because doing so sacrifices accuracy and expressive power.
At last! At last! Thank you for spelling it out. Thank you! Thank you! Thank you!
Never understood the perverse pride taken in voluntary decreptitude. So you don't brush your teeth or shampoo your hair and flaunt a pair of smelly armpits. Go ahead. Dumb and dumber.
I bought a philosophy course on CDs, consisting of a number of audio lectures. One of the speakers insisted on using the word "datum", and repeatedly. I don't think he ever used "data", and hearing this word datum over and over again was irritating, especially since, when one is talking about philosophy, it doesn't seem often there is a datum that is clearly a single bit of information.
As someone in the US in medicine, I have never heard anyone use the term "datum", perhaps talk about a "data point", but never "datum".
It is impossible to judge the usage without reading it, but presumably you were listening to lectures in another field in order to broaden your education. People express themselces differently. How, specifically, was this usage wrong?
A datum is not usually a single bit of information in any field, as a "bit" is a tiny unit.
I don't think that the usage was wrong in a meaning sense, but its usage was rather overdone. Of all the other lecturers in this series, this was the only one who used the term.
It was as if this person was on a crusade to make sure the word is recognized and doesn't disappear.
If we are talking about information, a bit is precisely one binary digit, no more, no less.
"Professionals maintain their language usage rules whether laymen understand and follow them or not" :-)
No, that is not true. "Bit" is commonly used to mean "binary digit," but formally the latter is properly termed a "binit." That term is not used much outside of information theory, but in any case "bit" in its formal sense is a unit of information, not data storage. The problem here is not the usage of "bit" but of "information." Information is not the same thing as data. One "bit" is the unit of information which resolves the uncertainty between two equiprobable events. One bit of data storage only contains one bit of information when the two states are equiprobable; it may contain zero bits of information.
All that is rather academic, but to bring the point back to everyday usage: "data" relates to symbols or to some means of storing the equivalent of symbols; "information" relates to knowledge.
You may think so, but I couldn't possibly comment...
On the other hand, Mr Shannon, grumbling from the grave, can.
I refer you to "A Mathematical Theory of Communication".
As to your invention, I suggest you do exactly that with it.
"Professionals maintain their language usage rules whether laymen understand and follow them or not".
I find "data are" to be very pedantic and not even helpful in doing so. Time to move on. On a slight parallel - spaghetti can be pluralised in German: "would you like some more spaghettis?" Sure, it might raise a smile to any polyglot but why not?
I used to think I could pride myself on my Latin knowledge until I came across "status" of which the plural is, apparently, "status". Fourth declension at all that nonsense that was entirely synthetic syntax with no semantics! It is totally unrealistic to continue this Quixotic folly. Quixote did at least revel in the unfeasible*.
* Yes, another jab at the recent post on "un" versus "in" for the negative prefix of latinate words. Johnson, the n-gram chart was horribly flawed. It should have been an aggregate of the forms of *all* the glorious "un" candidates such as unimportant, unremarkable, unexpected and unrealistic against the pitiful "ins"; impossible ingrates the lot of them. And the least said about those devious "nons", well, the better. Bit of an non-event, wholly unspectacular!
Uno spaghetto, due spaghetti
The reason why English-speakers say "The spaghetti is ..." rather than "The spaghettis are ..." is that it has taken a while for us to grasp the concept of 'al dente'. I grew up with pasta that had been boiled to an undifferentiated mush.
I don't understand why it's so complicated. "Data" seems to me to be a word very similar to "collection." Nobody would ever say, for example, "the collection are nearly complete." Data is a set (another good example!) of individual things, so it makes sense to refer to data as a singular thing, not plural things. To say "data are" just sounds clunky and fussy.
No, "data" does not refer to a set, which is why computer scientists use the word "dataset." "Data are" may seem fussy in ordinary usage, but in scientific usage it is normal.
Scientist say "dataset" for the same reason chefs say "remoulade sauce."
I'd put it this way: if "data" is the plural of "datum", then "data are"; if "data" is data (i.e. a somewhat organized information, typicaly, but not necessarily, numerical), then "data is".
Similar thing is going on with fungi, which is plural of "fungus" (fungi are), and, at the same time, Fungi *is* a kingdom in Eukariota domain.
Context matters here. Whatever the usage in common speech, "data" has a plural sense in scientific research, where "datum" is still used to mean a single data point. Nobody uses "agendum" in English.
As for "statistics," the problem is one of meaning, not merely form. One does not collect statistics. Statistics are computed from available data, which are collected.
Well, OK, let us say that I have an employee directory with, I don't know, name, hire/rehire date, SIN, date of birth and salary. Is one record in that database a datum - or a single field in the database the datum?
I.e. if I know all of the above I have data on my employee, but I also have data on my employees; at the same time, knowing only date of birth without anything else attached is not a datum - it is simply a meaningless date.
Your example is common speech, not science. Nobody would use "datum" in your example. Your database contains data on each employee, which include(s) date of birth, etc. It doesn't really matter whether you say "include" or "includes." Anyone who would call you out on it (outside of a Johnson forum) is being pedantic.
Here's a hard-and fast rule: L/Cdr Data is singular, in at least two senses of the word. Data in every other instance are plural.
I know what you mean, but that should be: "'Data' in every other instance is plural," because you are referring to the word, not the data.
I just had a thought: Maybe GH1618 IS Lt.Cdr. Data... :P