Stock Prediction for ARGAAM Companies Dataset

Economic forecasting provides excellent profit opportunities and is a major motivator for most researchers in this field. In the fast-growing business world, the behavior of stock prediction is challenging for most stockholders and commercial investors. It provides benefits to investors to invest more confidently. Machine learning is an emerging technology that provides the capability to learn on its own through real-world intercommunications. Regression is the fundamental technique in machine learning which is useful for real-time applications. This paper experiments with stock price prediction effectively by using three machine learning techniques i.e. linear regression, decision tree, and support vector machine. The techniques were applied to the ARAMCO and Saudi Dairy dataset and the performance is evaluated using various parameters such as R2 value, MAPE, and RMSE. The results substantiated the hypothesis.


Introduction
Financial leverage or stock price forecasting is an important financial topic that has gained a lot of interest from academics for several years.It is based on the notion that publicly available data from the past has some forecast patterns for future market returns.The stock market is one of the most highly followed markets in the world.In this modern technology era, a lot of innovation has been made in different sectors of this market through technology.By analyzing the previous stock data, we can predict stock prices and indexes.A modeling technique called stock market prediction uses fundamental characteristics of the stock price to predict its potential values.The accessibility of a vast quantity of previous data is essential for such a forecasting system to succeed in its pursuit of profitable money markets.The efficacy of these algorithms is severely constrained by the fact that the data used in this sort of investigation are financial time series information.Furthermore, as risks are inherent as a result of irregular market patterns, volatility, noise, etc., they cannot be disregarded when using such approaches [1].The effective market hypothesis, which holds that the uncertainty-adjusted return, cannot be persistently achieved much above the viability of the entire [1] [2], is thus something that the regression techniques are bound to follow.This hypothesis makes the supposition that the stock value at the moment may be calculated as a consequence of stock price previous records and reasonable forecasts.Any departure from this premise could make the stock price unexpectedly.
To construct the stock market predictions with technological advances, a variety of machine learning (ML) methods was, nevertheless, conveniently simulated with the expansion of computing capacities.The famous artificial neural network (ANN) and its derivatives, genetic approach, support vector machine (SVM), support vector regression (SVR), and others have all been employed by researchers to forecast share prices.The difficulties faced by these predictive analytics are dealing with uncertainty while making more accurate stock price predictions, which reduces threats for traders and investors and produces a lucrative method.These models' accuracy draws more researchers, who are then inspired to suggest new, more accurate forecasting strategies.Currently, there has been a lot of fascinating work in the area of stock price prediction using ML techniques.The objective is to analyse market trends and predict stock prices including the index changes.Machine learning is a field of artificial intelligence that allows systems to learn from information and complete tasks without ever being given explicit step-by-step commands, depending solely on the data they've been fed [3][4][5].ML is utilized in a variety of areas where creating a traditional technique to solve an issue is difficult.The stock market is a gathering of stock buyers and sellers.A stock is a fraction of a company's ownership held by a single entity or group of people [6][7][8][9] One of the techniques for stock price prediction using machine learning is linear regression.It is the method in which historical data is fit to a straight line using the equation y = mx + c.Mathematically we can define linear regression as the method in which the values of the data set are fit to the straight line through the points which means the sum of the square of the distance between each point and the line is the shortest.The formula for the linear regression is known as the least square method and its hypothesis function is: In addition, various other machine learning techniques can also be used.Decision tree is a technique that is based on a tree-like structure where at each node a feature is tested.Based on that, a value can be predicted at the leaf of the tree.Support vector machine is based on kernel technique where calculations of higher dimension are performed in the lower dimension using kernels.Other techniques include Naïve Bayes, random forest, logistic regression, and deep learning techniques.This research study examines publications and explained how these techniques provide precise and reliable future forecasts to predict stock prices in the financial market.We employed several methods in this research, and we found that not all of them were able to forecast the data we required.There seems to be a fundamental need for computerized and automated techniques to handle with effective and powerful utilization of a significant amount of financial statistics to assist businesses and individuals in deciding crucial budgeting and planning.
In this paper, various machine-learning techniques have been applied to predict stock prices based on past data.The results are analyzed through root mean square error, R2 value and mean absolute precision error.The rest of the sections of the paper discusses the methodology.The next section presents the literature review.This is followed by the methodology and results section.The paper concludes with discussions on future work.
The main contributions of the paper are as follows: The paper presented machine learning as an emerging technology that provides the capability to learn on its own through real-world datasets.Use linear regression, decision tree, and support vector machine to predict the stock prices for Saudi companies by using historical data.The above-mentioned techniques were applied to the ARAMCO and Saudi Dairy datasets.The performance is evaluated using various parameters such as R2 value, MAPE, and RMSE.The results are presented and analyzed to substantiate the hypothesis.

Literature Review
There has been a significant volume of research on employing machine learning for stock price prediction.Some of these researches are [10][11][12][13][14][15][16][17][18].Selvamuthu et al. [19] presented a comparative study on the performance of several schemes on the tick dataset and 15-min dataset of an Indian company.This comparison concludes that the tick dataset has preferable accuracy to the 15-min dataset.
Since the advent of the virtual maturity level, analogical reasoning has transitioned to the area of technology that is most important and according to Zhang et al. [20], the most suitable strategy is to use synthetic neural networks and momentary neural networks, which may be considered to be a form of device awareness.Mobin et al. [21] proposed a workable tool to predict the movement of stocks with less finesse.The dataset of stock prices from the prior period was the first problem they examined.For the original study, the dataset has undergone pre-processing and refinement.Due to this, the structure may even concentrate on preparing the dataset's original information.After preparing the data, one can see how the vector machine is helped by the data set and the output it produces when using arbitrary woods.Similar to this, the suggested composition looks at the application of the soothsaying tool in real-world settings and issues linked to the sensitivity of the advice given.The accuracy level of the Stock Prediction model is 80.3%.Khattak et al. [22] proposed a model for predicting stock trends with a focus on lowering the sparsity of the data gathered through machine learning.Researchers use a k-nearest neighbor technique to categorize stock metrics after doing an outlier identification of the information provided for dimensionality reduction.
In [23], the analysts have used (ANN) and applied the mathematics and statistics technique ARIMA on nearly three years of statistical data to forecast the KSE0-100 index.The research provides [24] a depth evaluation of the bottom-line relationship between large-scale-economic factors and the KSE market.In the same way, some researchers showed that a person's feeling plays a major role in analyzing and ultimately making a decision [25].If you have data about the user's mood and their behavior by using social media, it will be very helpful to predict the user's decisions for investment in business as well as market performance.In [26], the researchers present the process of collecting data from different international financial markets.After that with the help of machine learning algorithms, they predicted the stock prices.In [27], the researchers show how the impact of financial and commercial news on stock prices.Some political events occurred, and they affected stock prices.Correspondingly, various data science techniques are also used in [28] the researchers have used variable moving average techniques by using the data of the Vietnamese stock market.Now a day's many companies and institutions are working on predictions of the stock market.These companies aim to propose the best method to predict the daily behaviors of stock market indexes and modify their systems accordingly.Majhi et al. [29] predicted stock market and SP500 indexes by using bacterial foraging optimization techniques to achieve short and long terms goals.After that, they compared their model to a Multilayer perceptron, and they used the liner combiner model to calculate their weights.The results show that the Majhi method is the very best fit in that situation because of the less complexity and more precision of the MLP method.Elon Musk tweets about Bitcoin, increasing the prices of Bitcoin in the cryptocurrency market.Stock prices are also depending on a few people's statements.In technological terms, these concepts are described as text analytics and text mining.This is another type of forecasting system [30] that are using a text-mining approach to predict real-time stock prices and market behaviors.Few institutions and researchers used text analytics and text mining approaches.Their findings investigate the effects of economic and financial news in predicting the stock market.
Like a work about the effect of emotions like hope, fear, and worry on increasing or decreasing the amount of Dow Jones on the next day [31] or the effects of Facebook on the stock market.Other influential people's behaviors and mood affect stock markets.Therefore, there is a connection [32] between the capacity of investors and other activities of the stock market using the updated mood of people.Past research shows that 100 million American Facebook user mood impacted on the stock market between the periods of 10/09/2007 to 10/09/2010.
Another related work is based on Bollen et all's strategy [33].The Bollen follows the text mining and text analytics approach.This approach received worldwide media coverage from various arbitrary resources.The main strategy they followed measuring the behavior and mood of people that are affecting the stock market.The authors gather all the data of Twitter users in 2008, then with the help of Google algorithms that are based on mood states namely Google Profile Mood Status to detect the public response for the presidential elections.The algorithms are categorized into six main parts that are calm, happy, vital, sure, alert, and Kind.The results obtained are very shocking for that research work.
Based on the research, we can see that there is still a gap in using machine learning techniques Stock Prediction for ARGAAM Companies Dataset to predict stock market prices.This research undertakes this problem and applies three popular machine-learning algorithms to predict stock prices.

Methodology
The overall implementation pipeline is shown in Figure 1a along with the dataset (Figure 1b).In this research, we have first collected the stock prices company data i.e.Saudi Dairy and Aramco Company, and then performed pre-processing on it.The complete procedure of pre-processing in this project can be divided into three text operations: Data collection i.e. removing unwanted columns that are not used in stock predictions and cleaning data by removing null records.Exploring (exploratory data analysis) Predicting stock prices The dataset comprises the following columns: the date, opening price, closing price, high and low price of the stock on that particular day, adjusted closed price and the volume of the stock.
The records containing null values were removed using the panda's library.The data was standardized by subtracting the mean value and then normalized by the standard deviation as follows:

) σ
For outlier removal, those values are considered as outliers that are ±3 standard deviation from the mean.

Results
In this section the results obtained from the experiments are shown.By providing the proper simulation screenshots.The results for basic linear regression are shown in Table 3.The table shows the RMSE, R2, and MAPE values for values of N i.e. the number of past data to be used for future prediction.As can be seen, all of the techniques provide a good fit for regression as can be evident from Table 4.However, the decision tree provides the best result.A comparison has been made with other approaches to the literature review and similar results have been reported as shown in Table 5.  [34] 0.969 Convolutional neural network (CNN) [34] 0.973 Recurrent neural network (RNN) [34] 0.975 Long short-term memory (LSTM) model [34] 0.977

Conclusion and future work
In this paper, linear regression, decision tree, and support vector machine were used to predict the stock prices for Saudi Company by using its real historical data.These three methods were used to design the model and acquire optimum results.It is found that simple linear regression models are not up to mark as confidence (R2) is just 86%-96% with MAPE 63%-142% and RMSE 0.313-2.48which is well beyond the criteria of good regression.Whereas results obtained using decision tree values are highly accepted as confidence (R2) acquired for 99% with MAPE always less than 5% and RMSE is around 1.2 to 1.5%.Recommended other features selected to enhance the results of the model are news related to the industry, government policies impact, new Product launches, and dollar Prices.

Figure 1: The Implementation DetailsTable 1 ,
2, and 3 shows the dataset description and results after pre-processing.For evaluating techniques of machine learning, simple linear regression, and regression using the last values method, reasonable accuracy of the model is achieved, as discussed in the next sections.Considered performance evaluation parameters includes the: root mean square (RMSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE), the mean square error (MSE), and the correlation coefficient (R2).The basic linear regression model is used from sklearn with the following criteria: 60% training, 20% validation, and 20% testing division of the dataset.

Table 3 :
Performance of linear regression for various hyper-parameter values