NVIDIA Unveils Nemotron-CC: A Trillion-Token Dataset for Enhanced LLM Training