Large Language Models Evaluating the Long Tail: Assessing LLM Performance Across Downstream Tasks Read more