### Data collection

In this study, in total, 160 datasets were used to investigate and compare the ability of classical and novel integrated machine learning models in predicting Q_{s} during free-flow flushing. The datasets were extracted from the Janssen^{18} experimental study report. Janssen^{18} investigated the effect of reservoir drawdown on flushing channel formation. Experiments were performed in a concrete flume 50 m in length, 2.4 m wide and 1.5 m high. Sediments used in these experiments were non-cohesive sediment with mean grain size (D_{50}) and sediment saturated density of 1.25 mm and 1270 kg/m^{3}, respectively. Deposits were set 10 cm above the valve gate threshold and were paved with side slopes of 1%. Water levels in the reservoir, bed levels in the flushing channel, width of the flushing channel, outflow, and sediment discharge were measured over time to investigate the variations of flushing channel characteristics along the reservoir.

The water level in the reservoir (h_{w}), bed level in the flushing channel (h_{b}), outflow (Q_{out}), inflow (Q_{in}), and elapsed time of flushing (T) are considered as input parameters to predict sediment discharge (Q_{s}). 112 datasets (70%) were used for training the developed models, 24 datasets (15%) were considered as validation data to prevent overtraining, and 24 datasets (15%) were used to evaluate the accuracy of the implemented models. A summary of the statistical criteria of utilized datasets is illustrated in Table 1.

**Table1 Statistical characteristics of data.**

Moreover, all the variables (inputs and output) before the training process were normalized, as follows:

$${X}_{i}^{^{prime}}=frac{{X}_{i}-{X}_{min}}{{X}_{max}-{X}_{min}},$$

(1)

where the maximum and minimum values of variables are denoted by ({X}_{min}) and ({X}_{max}), respectively.

### Conventional machine learning models

#### Group method of data handling (GMDH)

The GMDH neural network is a basic technique of self-organizing data mining that was introduced by Ivakhnenko^{19} as a rival to the stochastic approximation method^{20}. Proposed algorithm is based on a multi-layer structure that provides a structure for simulating and modelling complex phenomena, image processing, and data mining. A complicated discrete function called Ivakheneco polynomial is used to connect inputs and output variables in the GMDH.

In the current study, a polynomial function was used as a transfer function in the neurons of the middle and output layers as follows:

$$Y={W}_{0}+{W}_{1}{X}_{1}+{W}_{2}{X}_{2}+{W}_{3}{X}_{1}^{2}+{W}_{4}{X}_{2}^{2}+{W}_{5}{X}_{1}{X}_{2},$$

(2)

where *W* is the coefficients’ vector (network weights), *X* is the input vector, and *Y* is the output. In the conventional GMDH, the coefficients are determined by the least square estimation (LSE) model^{21}.

#### Support vector regression (SVR)

SVR generally is an extended version of support vector machines (SVM). SVR was developed by Vapnik^{22} to solve the regression problems by the SVM model. The SVR model uses the structural risk minimization technique to find the best regression hyperplane, which is defined by the following equation:

$$Y=vartheta varphi left(zright)+c.$$

(3)

In this equation, *Y* denotes the non-linear regression function to predict the target vector,(vartheta) is the weight vector, and *φ(z)* is an irregular higher dimension mapping input data. The coefficients *c* and (vartheta) are estimated as follows:

$$Minimize :{R}_{reg}=left[frac{1}{2}{Vert vartheta Vert }^{2}+Psum_{i=1}^{n}{(xi }_{i}+{xi }_{i}^{*})right],$$

(4)

$$Subject, to: left{begin{array}{c}{Y}_{i}-left(vartheta varphi left({z}_{i}right)+{c}_{i}right)le varepsilon +{xi }_{i}\ left(vartheta varphi left({z}_{i}right)+{c}_{i}right)-{Y}_{i}le varepsilon +{xi }_{i}^{*}\ {xi }_{i},{xi }_{i}^{*}ge 0end{array}right.,$$

(5)

where *P* factor is the regularization cost parameter, (varepsilon) is the acceptable error margin, and ({xi }_{i},{xi }_{i}^{*}) are positive constants called slack variables^{23}.

#### Adaptive neuro-fuzzy inference system (ANFIS)

ANFIS is a hybrid model developed based on the Takagi–Sugeno (TS) fuzzy inference system^{24}. ANFIS has both neural network (e.g., ability to train machines, the ability of parallel processing, and connectionist structures) and fuzzy logic (e.g., simplicity and flexibility, high speed of training, and can be combined easily) advantages in a single framework. This structure makes the ANFIS a robust model for formulating non-linear problems and forecasting time series phenomena. ANFIS uses a fuzzy inference system (FIS) to calibrate a membership structure or parameters with a combination of methods like the least-squares-type method and the back-propagation algorithm. ANFIS structure comprises five layers and each layer includes different nodes. These layers include the fuzzy layer, production layer, normalization layer, de-fuzzy layer, and total output layer. The first layer consists of several membership functions that convert input variables to fuzzy inputs. The second layer is formed when fuzzy rules are determined for the nods. In the third layer, the effectiveness of the second layer outputs is normalized by rules. The fourth layer receives as input the normalized values from the third layer and performs the defuzzification process. The fifth layer (the last layer), utilizes the defuzzification values returned by the fourth layer to produce the system output as a numerical variable^{25}.

#### Artificial neural networks (ANNs)

The ANNs are one of computational intelligence systems composed of many units or nodes. These nodes are called neurons. Neurons are connected by weights and are aggregated into separated layers. The multi-layer perceptron neural network (MLPNN) is a class of feed-forward ANNs that consist of three layers: the input layer (introduction of data to the network), the multiple hidden layers (data processing), and the output layer (results from data processing). Except for input neurons, each neuron uses a non-linear activation function of the weighted summation of the inputs of the previous layer. The structure of this model is highly influenced by the problem variables, and the interconnection strategy is used in the training step^{14}.

#### Multiple linear regression (MLR)

Multiple linear regression is the development of simple linear regression for cases with more than one variable. Basic equation for this model is:

$$Y={beta }_{0}+{beta }_{1}{X}_{1}+{beta }_{2}{X}_{2}dots {beta }_{k}{X}_{k}+varepsilon ,$$

(6)

where *Y* is the output, *X* is the input vector, *β* represents the regression coefficient vector, and *ε* is the standard estimation error^{26}.

### Meta-heuristic algorithms

#### Henry gas solubility optimization (HGSO)

HGSO is a new meta-heuristic optimization method imitating Henry’s law that was presented by Hashim et al.^{27}. According to the Henry’s law, the amount of dissolved gas in a liquid is proportional to its partial pressure above the liquid. This law was formulated by William Henry as the following equation:

$${C}_{g}={H}_{g}times {P}_{g},$$

(7)

where, *C*_{g} is the solubility of the gas, *P*_{g} denotes the gas partial pressure, and *H*_{g} is Henry’s constant. Henry’s law constants depended on the system temperature.

The HGSO algorithm consists of the following steps: initialization process (the positions of the population are initialized), Clustering (to divide the population into equal clusters), evaluation (the clusters are evaluated to determine the gas that reaches the highest equilibrium), updating Henry’s coefficient and solubility of gas, updating the positions and escaping from the local optimum (selecting and re-initializing a number of worst agents). A more detailed presentation of the HGSO algorithm can be found in Hashim et al.^{27}.

#### Equilibrium optimizer (EO)

EO is a new swarm-based meta-heuristic algorithm proposed by Faramarzi et al.^{28}. EO is inspired by the control volume mass balance models. Search agents in EO are particles (solution). Concentration is considered as the particle’s position. Positions of search agents are randomly updated based on the best current solutions (equilibrium candidates) until the equilibrium state (optimal result) is achieved. EO, like other stochastic optimization algorithms, considers an initial population to start the optimization process. Positions of the population are randomly initialized in the space search. At the beginning of the optimization process, the particles are classified to specify the equilibrium candidates. Equilibrium candidates are evaluated to provide a search pattern for the particles. These candidates are five particles that are nominated as the equilibrium candidates. Four equilibrium candidates have the best fitness in all populations, and another is chosen based on the average fitness. The four best particles improve the exploration, while the average one helps to increase the exploitation abilities of the algorithm. The populations can be determined based on the type of problems^{28}.

### Integrated machine learning models

Meta-heuristic algorithms are robust optimization strategies for low and high dimension complex problems. Meta-heuristic algorithms in machine learning models optimize the structure, and calibrate unknown weights coefficients of machine learning models or both structure and weights. In this study, HGSO and EO were applied to optimize the weights of GMDH and unknown coefficients of SVR. A GMDH with two middle layers and maximum of ten neurons for each layer was considered. Weights of quadratic polynomial transfer functions of neurons were obtained by EO and HGSO. Regularization parameter (*C*), the insensitive loss coefficient (*ε*), and the kernel constant (*σ*) are three parameters of SVR that affect the output results. HGSO and EO were used to obtain the optimal values of these parameters. Figures 1 and 2 illustrate the flowcharts of integrated GMDH and SVR models, respectively. HGSO and EO, like most other meta-heuristic algorithms, have some parameters that must be initialized before the optimization process. In this study, the HGSO and EO parameters values were initialized based on the main source study^{27,28}. The values of the initial parameters of meta-heuristic algorithms are shown in Table 2.

**Figure 1**

SVR- HGSO and SVR- EO flowchart.

**Figure 2**

GMDH- HGSO and GMDH- EO flowchart.

**Table 2 Considered initial values of developed EO and HGSO algorithms for this study.**

### Evaluation criteria

The results of developed models are evaluated by four standard statistical criteria. These statistical indices are included the root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (R^{2}), and mean absolute relative error (MARE). RMSE is used to indicate the difference between predicted and observed values. MAE is the average of absolute differences between predicted and observed values and is indifferent to the direction of errors. The model will have the best prediction if the values of RMSE and MAE are close to zero^{29}. The correlation coefficient measures the degree of similarity between predicted and measured data. R^{2} = 1 corresponds to a perfect model match to the observed data. Predicted and actual values are similar if the MARE value is close to 0^{30}. These indexes were calculated as follows:

$$MARE=frac{1}{n}sum_{i=1}^{n}left(frac{left|{X}_{Oi}-{X}_{Pi}right|}{{X}_{Oi}}right),$$

(8)

$$MAE=frac{1}{n}sum_{i=1}^{n}left|{X}_{Pi}-{X}_{Oi}right|,$$

(9)

$$RMSE=sqrt{frac{1}{n}sum_{i=1}^{n}{({X}_{Pi}-{X}_{Oi})}^{2}},$$

(10)

$${R}^{2}=frac{sum_{i=1}^{n}({X}_{Pi}-{overline{X} }_{P}){(X}_{Oi}-{overline{X} }_{O})}{sqrt{sum_{i=1}^{n}{({X}_{Pi}-{overline{X} }_{P})}^{2}{{(X}_{Oi}-{overline{X} }_{O})}^{2}}},$$

(11)

where the observed and predicted values are denoted by ({X}_{Oi}) and ({X}_{Pi}), respectively. The ({overline{X} }_{O}) and ({overline{X} }_{P}) represent the mean of the observed and predicted values, and n is the number of observations.