This dissertation is a study of two problems on estimation in the areas of natural language and speech. In the first problem we revisit the classical problem of estimating the size of unseen elements which we study in the context of a regime that is characterized by a large number of rare events, natural language being one. We propose an estimator of the size of the vocabulary of the underlying population that generates an observation and show that it has theoretical guarantees of optimal performance.