We design and discuss performance of a hierarchy of ML schemes, such as Regression, Auto-Encoders, Convolutional Neural Networks, Graphical Neural Networks, Neural ODE and LSTM, incorporating in these schemes physics of power grids as needed to provide sufficient in quality and execution-fast models. We design a hierarchy of data-driven, learning models of interest for power system transmission and sub-transmission. Models in the hierarchy are classified according to the following three features: (a) Operational focus, specifically normal operations vs abnormal operations of high uncertainty (volatility). (b) Amount of physical information, such as reliance on underlying power flow or swing equations, used in the model. (c) Spatio-temporal scales covered and the type of data used. Features of the models are mixed to achieve better results. For example, in a quasi-static formulation appropriate for an abnormal regime of an incidental line failure, we assume that recording measurements/samples are separated from each other by tens of seconds, capture vector of potentials (voltages and angles, or just voltages) over the grid and aim to localize the failure and predict its type with sufficient geographical accuracy and as fast as possible. In another extreme of a dynamic monitoring of the system state with seconds-scale recording of voltages, phases and power flows available at a sparse set of grid-critical locations we aim to predict voltage and power flows at the critical locations of the grid for the next period (minutes-to-an-hour) assuming that the system continues to evolve however in a gradual pace of normal operations. We compare performance of the models and reach conclusion on the models effectiveness and suitability. Selecting models appropriate for regime-specific set of features is the main goal of the study. This is a joint work with Andrey Afonin, Christopher Koh, Laurent Pagnier and Nikolai Stulov.