On the fallibility of the google corpus
Today's xkcd provides a clever illustration of why google results are not necessarily the most reliable measure of frequency:

I'm glad to see that gardening, knitting, and blogging made the list -- I didn't realize my hobbies were so death-defying (and therefore hip and edgy).
It's unlikely that Mr. Munroe actually based this comic on real google results. But it relates nicely to something Chris Davis pointed out to me a number of months ago: all the internet frenzy over the Miss Teen USA contestant's ungrammatical rambling posed a significant problem for the use of google as a tool for corpus linguistics. The sheer frequency with which the text of her speech was reproduced and discussed might lead to the mistaken impression that "like such as and" was a grammatical string of English.
Quick, let's all scramble to revise our theory of syntax to accommodate the incessant repetition of this unfortunate young woman's disfluency! And while we're at it, don't forget to warn your children of the dangers of horticulture and fiber arts.
(This is not to suggest that google is entirely unhelpful or uninformative as a corpus; it can in fact be useful for certain purposes. But please, let's not take ourselves too seriously.)

I'm glad to see that gardening, knitting, and blogging made the list -- I didn't realize my hobbies were so death-defying (and therefore hip and edgy).
It's unlikely that Mr. Munroe actually based this comic on real google results. But it relates nicely to something Chris Davis pointed out to me a number of months ago: all the internet frenzy over the Miss Teen USA contestant's ungrammatical rambling posed a significant problem for the use of google as a tool for corpus linguistics. The sheer frequency with which the text of her speech was reproduced and discussed might lead to the mistaken impression that "like such as and" was a grammatical string of English.
Quick, let's all scramble to revise our theory of syntax to accommodate the incessant repetition of this unfortunate young woman's disfluency! And while we're at it, don't forget to warn your children of the dangers of horticulture and fiber arts.
(This is not to suggest that google is entirely unhelpful or uninformative as a corpus; it can in fact be useful for certain purposes. But please, let's not take ourselves too seriously.)

3 Comments:
i love this blog. i love it. please write on it more. all of it makes me happy, posts both individual and collective.
that is all. thank you.
amt.
Just for you, I will try.
viagra vs cialis viagra rrp australia viagra shelf life viagra covered by insurance buy cheap viagra online viagra equivalent viagra uterine thickness lowest price viagra viagra for sale without a prescription what is generic viagra viagra commercial canyon filmed viagra discount buying viagra in uk get viagra
Post a Comment
Links to this post:
Create a Link
<< Home