Computational Complexity & Linguistics
Friday, August 27th, 2010One of the books that I am working through this summer in preparation for the Fall’s classes is Michael Sipser’s Introduction to the Theory of Computing. I chose to read this book after reading chapters 1 and 2 of the excellent (but hard) Computational Complexity: A Modern Approach by Sanjeev Arora and Boaz Barak. Thanks to David Malec at UW-Madison for recommending the second book.
The Arora and Barak text gives very detailed proofs of computational complexity topics (complexity classes, P, NP, turing machines, etc). They don’t spend much time on languages, alphabets, or basic concepts. For these topics the Sipser book is phenomenal. Part 1 of the Sipser book feels like a linguistics textbook.
In his book he proves Theroem 1.25:
The class of regular languages is closed under the union operation.
A regular language is one that is accepted by a finite automata, a finite state machine. Think of it like this: you are a finite automata. When you hear someone speak, you construct meaning out of the arbitrary utterances. Take the word “automobile” for example. When someone says that word, you hear:
aw-tuh-moh-beel
So, we define an alphabet to be the set of all possible arbitrary sounds (there’s probably a lot!). Then we define a regular language to be a collection of combinations of these sounds that we “accept” — that have some meaning to us. For some people the word “automata” might not even be in their language, technically speaking, I argue that no two humans have the same language. I can converse with most english speakers, so it is likely that the intersection of my language and others is large.
Back to his proof. His claim is that if A and B are regular languages, then the union of the two is also a regular language. Let’s take two different languages: english and spanish. I won’t go over the details of his proof in this blog post (it’s good), the basic details from the proof are:
- Construct a machine M to recognize both english and spanish (i.e. a bi-lingual person)
- We can construct their set of “states” as a combination (cartesian product) of both languages. We assume the alphabet of both languages is the same (in this case, they’re very similar)
- Our machine (bilingual person) can accept a word construct in whole as an english word or a spanish word but not a combination of the two (no spanglish!)
To prove the claim, we prove existence of a bi-lingual person. This is such a fun application!
I am not a linguist, so maybe my musing are way off mark. Please drop me a comment if they are; I see lots of connections between what we do in computational complexity and what linguists do!
