To understand human language, the models have to perform various reasoning skills, e.g., logical reasoning, commonsense reasoning, temporal reasoning, etc. There are multiple datasets for directly evaluating each of such reasoning skills. However, these reasoning skills are mostly required for downstream applications and not as standalone skills. For instance, a model may need to perform arithmetic reasoning for answering a question or to correctly summarize a table. However, it is not clear whether a model that performs well on a dataset that is designed to evaluate arithmetic reasoning would also improve the results on a QA dataset that requires arithmetic reasoning. As a result, we should pay special attention to developing end-to-end models for downstream applications that are also capable of performing various reasoning skills. In this presentation, I will talk about our work on the challenges of end-to-end reasoning in downstream applications, namely coreference reasoning in question answering, and arithmetic reasoning in data-to-text generation and question answering.