We have been witnessing a real explosion of information, due in large part to the development in Information and Knowledge Technologies (ICTs). As information is the raw material for the discovery of knowledge, there has been a rapid growth, both in the scientific community and in ICT itself, in the study of the Big Data phenomenon (Kaisler et al., 2014). The concept of Smart Grids (SG) has emerged as a way of rethinking how to produce and consume energy imposed by economic, political and ecological issues (Lund, 2014). To become a reality, SGs must be supported by intelligent and autonomous IT systems to make the right decisions in real time. Knowledge needed for real-time decision-making can only be achieved if SGs are equipped with systems capable of efficiently managing all the surrounding information. Thus, this paper proposes a system for the management of information in the context of SG to enable the monitoring, in real time, of the events that occur in the ecosystem and to predict following events. The proposed system is based on the Apache Spark to provide in real-time a streaming and distributed processing. This knowledge management system architecture supports the development of enhanced data, information and knowledge analysis and management methodologies. This work proposes a novel data selection methodology that filters big volumes of data, so that only the most relevant and correlated information is used in the decision-making process in each given context. New challenges arise with the upsurge of a Big Data era. Correlations between a huge data volume unstructured are often wrong when methods are dependent on data itself. It becomes more important than ever to know what to ignore and focus on what is important. It is in this scope that this paper gives its contribution. The proposed methodology searches correlations in data and only the most relevant data is used in each context. Data use is thus adapted to each situation, improving the forecasting process by reducing the data variability. The data filtering process also provides its contribution by reducing the forecasting execution time by using less, but more adequate data, in the training process. Using the proposed methodology, training data can be chosen automatically accordingly to the data relevance and correlation for each problem, preventing the use of excessive and ambiguous data; and preventing an over-filtering of data that often comes from using only small amounts of highly correlated data while discarding information that could be relevant but whose value is not easily perceived. A case study is presented, considering the application of the proposed methodology. Results show that the data selection increases the forecasting effectiveness, as well as the computational efficiency of the forecasts, by using less yet more adequate data.