Modeling and Analysis of Transportation Flows Created

by E-commerce Transactions

 

Adam Ho, Erhan Kutanoglu, Michael Cole

Department of Industrial Engineering

University of Arkansas, Fayetteville, AR

 

Michael Bartolacci

Computer Science

Pennsylvania State University, Reading, PA

 

1. INTRODUCTION

1.1       Motivation

            Many studies and surveys show the signs of an exponentially growing Internet-based economy. The recent growth of business and trade realized over the Internet has drawn a lot of attention to electronic business, whether it is business-to-business or business-to-consumer. The increasing availability of e-commerce solutions provides firms with new potential for reaching new customers and business partners. Traditionally, the two most formidable barriers for this type of extended business have been distance and the lack of access to key sales and marketing areas. With the potential removal of such barriers in the new economy, United States electronic sales are projected to be $380 billion in 2002 (U.S. Department of Commerce News, 2001). Ultimately, e-commerce will change and make an impact on the United States economy in terms of sales, jobs, and business opportunities. The goal of this study is to identify the effects of e-commerce-based transactions on transportation freight movements between regions. We believe that e-commerce has diminished and will continue to diminish the barrier effect of distance in the U.S. economy.

The question that arises is “How will growing e-commerce affect the physical transportation network?" Similar to traditional commerce transactions, an e-commerce transaction may result in transfer of goods. This physical exchange of goods relies heavily on the traditional transportation network. We hypothesize that the growing number of e-commerce transactions affects the distribution of loads on the transportation network, with potential changes on the usage of different modes such as air, rail, road, and inland water. In e-commerce, instead of shipping 100 computers in one truckload to a local store, 100 boxes, each with one computer, are shipped to a dispersed set of customers. For example, on a single Saturday in July 2000, 100 airplanes and 9,000 trucks delivered more than 250,000 copies of Harry Potter and the Goblet of Fire to Amazon.com customers all over the United State (Environmental News Network, 2000). Also, according to a senior fellow at Inform, an environmental research organization in New York City, It's unlikely e-commerce will save the planet as some have claimed," says Bette Fishbein. "There might be some reductions in energy use, but a huge increase in packaging and shipping by air results in much more air pollution (Environmental News Network, 2000).

A quick analysis of the U.S. Census Bureau's commodity flow surveys (1993 and 1997) indicates an increase in the average distance of each ton of products shipped (www.census.gov). This implies that over the years, an average load is shipped to a destination that is farther away from the origin. Although not all of such changes are due to e-commerce, we believe that the growth of e-commerce results in a diminishing effect of distance on transportation flows between distant regions. Our goal is to model the changes in the distribution of transportation flows given increasing amounts of e-commerce and the corresponding diminishing importance of distance.

For the purposes of planning by governmental agencies and transportation providers, surveys have already been undertaken through partnerships between the Census Bureau, the Department of Commerce and the Department of Transportation to collect data on the movement of goods (not necessarily e-commerce initiated). The data from this survey, referred as commodity flow survey (CFS), are used by public analysts and transportation providers to assess the demand for transportation facilities and services, energy use, and environmental concerns. We foresee that public analysts and transportation analysts can make use of the knowledge on changing pattern of flows due to e-commerce to allocate resources and plan for the future.

            Currently, e-commerce represents approximately 1% of the total U.S. economy (U.S.  Department of Commerce News, 2001). E-commerce generated flows are only a fraction of the total flows. However, due to its expected exponential growth, e-commerce will represent a significant part of the economy in the near future. At this point, the Census Bureau claims that producing a separate series of data at micro levels for e-commerce can be very difficult and expensive (Atrostic, Gates and Jarmin, 2000). Therefore, in this research, we seek to model the flow of goods for e-commerce by using the readily available commodity flow survey data from the Census Bureau. We intend to focus on a handful of selected SCTG (Standard Classification of Transported Goods) codes. Some of these products are more e-commerce related in the sense that their flows will be more likely be affected by e-commerce much earlier than others. A model developed for an e-commerce-oriented product will give us a better feel of the future directional distribution of other products when e-commerce grows even further. We employ the use of gravity model in this research because it is capable of capturing the major components that contribute to the transportation flows system.

 

1.2. Report Outline

The rest of the report is organized as follows: In Chapter 2, we provide a rather extensive review of the relevant literature on different applications of gravity models, a model that is primarily used in estimating transportation flows between regions. In this research, this model has been further developed to estimate freight movements with the effect of e-commerce. We discuss data sources in Chapter 3. In Chapter 4, we discuss our modeling effort and how we calibrate the base flow data obtained from the 1997 commodity flow survey. Base flow is the historical flows of goods exchanging between regions. We also present the way we determine different parameters to estimate future flows, and the process in assigning the flows to different transportation modes. We provide our first preliminary analysis using SCTG code ‘35’ (electronic and electrical products, and office equipments) as our base flow condition in Chapter 5. We also show our preliminary output of the gravity model and the way we use two other product flow data to eliminate the bias effect toward product code ‘35’. We show that we can use the results to validate the use of gravity modeling for quantifying the directional distribution of transportation flows under diminishing effects of distance in e-commerce. We also present the findings of the expected usage of different modes in year 2005. Finally, we summarize our main findings and provide some insights for future research in Chapter 6.


2. LITERATURE REVIEW

In this chapter, we present a review of literature on gravity modeling, which is the modeling tool that we use in this research. We first give an overview of gravity modeling applications in trip and freight distribution as well as other economic applications. While we review the relevant literature, we also highlight several differences between previous research and our implementation for e-commerce flows. To our knowledge, no previous research effort has used gravity models to capture the effect of e-commerce on the United States transportation network. In that sense, this research can be viewed as the first attempt toward such a goal.

 

2.1       Introduction to Gravity Models

            Gravity modeling was first introduced into transportation modeling in the 1950’s. Gravity models belong to a class of models called synthetic models, and they are generally used for the rough approximation of actual movements (Hamburg, Kaiser and Lathrop, 1983). Gravity models are often used for estimating trip distribution in a transportation context. These models have also been modified and used to estimate freight flows between a set of production and consumption regions. The gravity model is particularly useful when there are sizable distances and cost differences between each pair of production and consumption regions. Such characteristics are present in the world of electronic commerce.

                         Gravity models were originally developed from an analogy with Newton’s gravitational law (Ortuzar and Willumsen, 1990). The simplest formulation of the gravity model is

                                                                                                                      (1)

where  Tij = the number of trips between origin i and destination j,

            Oi = population of origin region i,

            Oj = population of destination region j,

            dij = distance between origin i and destination j,

           aij = proportionality factor

            The initial gravity model, Equation (1), was later modified by modeling the effect of distance with a more generic function f (cij), which represents the disincentive to travel as distance, time or cost increases. The modified model thus became

                                               Tij=aijOiOj f (cij)                                                                (2)

            The deterrence function f(cij) (also called the impedance)  is usually defined in terms of distance between region i and j. The potential problem with any gravity model application is that, in the flow matrix, the consumption of say Region 1, 2 and 3 may not be equal the production of Region A that has produced the flows. Since gravity model is a closed system, where all flows that are created are consumed within the system, the summation of consumption capacities should equal the production capacities. Therefore, an iterative process is employed to adjust Tij to achieve equality between production and consumption (Hamburg, Lathrop, and Kaiser, 1983). This procedure will be further discussed in section 4.3.

 

 

 

2.2       Gravity Modeling in Trip Distribution Problems

                The trip distribution problem deals with the assignment of traffic from given origin zones to given destination zones. This problem is built on the idea of accessibility of one region from another, thus creating the inter-activity between regions. In reference to the traditional form of gravity mode shown in Equation (1), the population of origin region Oi is substituted with Pi, which represents the production capacities, and the population of destination region Oj is substituted with Cj, which represents the consumption capacities. The relative number of opportunities such as work opportunities can be used as an accessibility measure for a zone. In this research, this can be viewed as the opportunity for online businesses to reach additional sellers/customers in further reaching regions, which then creates additional transportation flows.

            The types of marginal constraints with which we shall be primarily concerned are of the forms

                                  

                Tij = Number of trips flowing from region i to region j

                           Pi = Number of trips originated from region i

                           Cj = Number of trips consumed by region j

       These marginal constraints eliminate the gravity model problem discussed in Section 2.1 where all flows from region i to region j within a system should equal the production and consumption capacities. These constraints can also be represented as shown in Equation 5 and 6.

                                             ()                                                          (5) (6)

            Trip distribution models involving these types of marginal constraints are referred to as doubly-constrained distribution models (Erlander and Stewart, 1990). The gravity model that we develop in this research is also a doubly-constrained gravity model. A doubly-constrained gravity model could come in different forms, and such forms are governed by impedance values. Impedance values are determined from its functional form called deterrence function. Impedance values set the level of inter-activity between two regions. Erlander and Stewart (1988) present several basic forms, which we review briefly in Section 2.2.1.

            The Bureau of Public Roads (Connor and Whitton, 1965) for urban area planning suggests that the most effective representation for impedance value is travel times. The total travel time is usually the minimum total driving time over a path between zones (or regions) plus the terminal times at both ends of the trip. Travel times provide a realistic measure of the actual spatial separation between regions, as it is likely to influence automobile drivers in their decisions as to places to work, shop, etc. In effect, the travel time factor measures the probability of making a trip during each time unit. Distance, travel cost, and many other spatial separation inter-relations have been used in the past as the factor to determine the impedance value.

 

 

 

 

           

Different Forms of Gravity Models:

a)                  Doubly Constrained Gravity Model with Given Inter-Zonal Weights

            This type of gravity model assigns a set of inter-zonal weights for origin- destination pairs. These weights are usually viewed as constants, which can be interpreted as a priori weights. Erlander and Stewart (1988) define a gravity model with inter-zonal weights as follows: Given Wij Î (0,1), (i ,j)Î L (set of all possible origin-destination pairs or Links), Tij is a solution of the doubly-constrained gravity model with given inter-zonal weights Wij,

                               (7)

            where Tij = Number of trips flowing from region i to region  j

                       Pi = Number of trips originated from region i

                       Cj = Number of trips consumed in region j

                      Wij = Inter-zonal weight between region i and region j

                        L = Set of origin-destination pairs

 

b)                  Doubly-Constrained Gravity Model with Exponential Deterrence Function

            According to Erlander and Stewart (1988), the exponential deterrence function is the most widely used deterrence function in trip distribution modeling. The exponential deterrence function specifies the inter-zonal weights in terms of parameter g ³ 0, and constants cij. Given g ³ 0, and cij ³ 0, (i ,j)Î L, a doubly constrained gravity model with exponential deterrence function is as follows:

                             Tij = PiCje(-gcij)        Pi>0 , Cj>0, (i , j) Î L                            (8)

 

c)         Doubly-Constrained Gravity Model with Exponential Deterrence Function and Socio-economic Factor

            This new form is a modification of the previous one with additional constants Kij that are interpreted as socio-economic factors. Socio-economic factors are included in trip distribution models in order to account for trip-making potentials of individuals, or the trip production potential of origins and the trip attraction potential of destinations (Kanafani, 1983). Given g ³ 0, Cj ³  0, and Kij Î  (0,1), (i,j)Î L, a doubly-constrained gravity model with exponential deterrence function exp(-gcij) and socio economic factors Kij is as follows:

                                              Tij = Pi Cj [Kij e (-gcij)]    Pi > 0 , Cj>0 , (i , j)Î L                             (9)

               

2.3 Regression and Least Square Analysis

            The third form of gravity model discussed in section 2.2 is as shown in equation (10)

                               Tij=PiCj[Kije (-gcij)]                                                                      (10)

This model is linear by itself, and with a logarithmic transformation, we can calibrate it using simple linear regression to determine better g values (Kanafani, 1983). The calibration process helps to better estimate the impedance values that will properly set the inter-activity between origin and destination pairs. Note that     

                                 ln [ Tij / Pi Cj ] = ln(Kij) - gcij                                                                                (11)

             According to Kanafani, in order to avoid any possible distortion in the estimate of g  when there are large cij values, a least squares function can be used. That is, one can try to minimize the sum of squared errors to fine-tune the value of g: The sum of squared errors or the least squares function is defined as

                                                                                                        (12)

             where   T’ij = Observed origin-destination flows of the base flow condition

                           Tij = Estimated origin-destination flows

            The values used as base flow conditions are obtained from 1997 commodity flow surveys. They are historical values measured in tons, which represents the flows of products from region i to region j.

            In this research, T’ij is obtained from the U.S Census Bureau's commodity flow survey, and Tij is estimated using the model that we have developed. The least squares estimation procedure attempts to seek the closest agreement between Tij and T’ij by minimizing the sum of squares. This is a method to improve and to evaluate the performance of the newly developed model and see how well the model is calibrated to base flow condition (Kanafani, 1983). We employ a similar procedure in our research.

           

2.4       Relevant Applications of Gravity Models

            Carter (1993) states that gravity modeling is an accepted market analysis tool for determining the economic feasibility of retail stores. Retail gravity models were originally used to forecast the number of consumers shopping in a city. Carter (1993) uses them to evaluate the value of retail property depending on the demand for the products sold by stores. His research allocates the consumer dollars that will be spent for a type of product within a trade area based on a reasonable assumption about consumer behavior. The retail model assumes that, within a trade area, the probability that a consumer will shop at a particular store is directly proportional to some power of the size of the store and is inversely proportional to some power of the distance between the consumer and the store. Distance is considered to be a dominating factor when it comes to trading, even if a large trade area is considered. However, in our view, this will change as e-commerce grows over time.

            Retail gravity modeling is also used to quantify the economic viability of a proposed project. Bottum (1989) introduces additional parameters governing the retail gravity model. In the revised model, consumer behavior not only depends on the size of stores and distance, but also is a function of accessibility, physical barriers, driving time and income levels. This approach is feasible when a small trade area is considered.

            Gravity modeling is also used in the travel industry to analyze the foreign tourist market. For example, Webster (1993) uses gravity modeling to predict the flow of tourists between a pair of countries as a direct function of each country’s population and as an inverse function of the distance between them. Here distance serves as the main impedance contributor for tourism. However, later findings in Webster’s research showed that there is a lack of significance displayed by the distance variable relative to the number of trips. Travel time turned out to be the best impedance.

 

2.5      Gravity Modeling for Freight Flow Distribution

            Freight flow distribution can be defined as the movement of goods from several origins to several destinations. Modeling freight flows can be considered from multiple dimensions, such as volume, weight, and trips. Veras and Thorson (2000) consider the amount of freight measured in tons (or any comparable unit of weight) as a unit of measure for freight demand and supply. This allows commodity-based models such as gravity models to more accurately capture the fundamental economic mechanisms driving freight movements, which largely are determined by the freight attributes such as tonnage.

            In commodity flow surveys, data for both tonnage and dollar freight values are available. However, Veras and Thorson (2000) suggest avoiding using shipment dollar values since they believe that shipment values ($) exhibit more variability from one product to another. For example, freight values may be as low as $9/ton for products such as gypsum; and the value may very well exceed $500,000/ton for products such as computer chips. In addition, Veras and Thorson also discuss that using "trips traveled", may result in inaccurate results since empty trips may represent 15 to 50 percent of the total trips and the goal is to estimate actual freight being transported. Based on this, we use tonnage as the unit of measure of flow for our gravity model implementation.

 

2.6       Linear Programming for Freight Flow Distribution

            Hamburg, Lathrop and Kaiser (1983) use linear programming (LP) for estimating freight distribution. Their LP formulation of freight flow distribution can be expressed mathematically as

                  Minimize                              (13)

                  such that       

                                                                

                                       

 

  where Tij = Shipment from production area i to consumption area j,

             Pi = Production in Region i,

             Cj = Consumption in Region j,

             cij = Impedance value between Region i and Region j (normally distance or cost).

            There are pros and cons in using LP to solve freight distribution problems. The major attraction of LP is its underlying basis of economic rationality, which is to minimize overall transportation cost. However, there is no rational central authority that could make all flow decisions between regions. In a way, each entity or region acts independently, which undermines the validity of LP approach. Moreover, the overall attractiveness is also damaged by inherent characteristics of LP, which have created some limitations in solving freight flow distribution problems Hamburg, Lathrop and Kaiser, 1983). First of all, for a system comprised of n regions, a normal solution to LP will produce no more than (2n-1) of the n(n-1) potential inter-regional flows, i.e., the optimal solution of the LP model will have only 2n-1 positive flows. Secondly, LP does not allow freight flows in both directions along a link (from i to j and from j to i), which is called cross hauling. Very few commodity movements exist without some cross hauling. Lastly, in many cases, unit transport costs are not linear with distance or shipment size, as is assumed inherently in an LP formulation (Hamburg, Lathrop and Kaiser, 1983). Due to these limitations and the widespread use of gravity modeling for similar freight flow estimation problems, we use gravity modeling in this research.


3. DATA COLLECTION

The unavailability of good data is perhaps the greatest challenge we face in this research.  Our goal is to model the directional distribution of flows generated by e-commerce, but there is currently no data source that has a direct measure of such flows. Estimated e-commerce sales volume in the United States (as a whole) is available, but it is not broken down into region-to-region basis. The Census Bureau has just begun to collect some survey data on the Internet economy. (Atrostic, Gates, Jarmin, 2000)

            Since there is no readily available e-commerce data, we model the e-commerce flows based on the existing commodity flow survey data. Commodity flow surveys capture data on shipments originating from selected types of business establishments located in the fifty states and the District of Columbia. Businesses that participate in this program provide information on the total value of shipments, total weights, major commodity type, modes of transportation used, miles traveled, and the origin and destination of shipments. We estimate the flows due to e-commerce from the existing commodity flow survey.

            Two sets of commodity flow survey data are available, 1993 and 1997. In 1993, there were virtually no significant e-commerce transactions. Therefore, we initially planned to compare the flows of a selected product code in 1993 to the flows in 1997. However, the 1993 survey data uses the detailed STCC (Standard Transportation Commodity Classification) coding system, whereas the 1997 commodity flow survey uses more aggregate SCTG (Standard Classification of Transported Goods) coding system. That is, goods are grouped within fewer product codes in 1997. Therefore, a direct comparison between the 1993 and 1997 data is not possible. As a result, we used the 1997 commodity flow survey as our main data source.

                Another set of data that we have looked at is the distribution of Internet domains registered in United States. As of June 2000, there were 13,260,000 active Web sites registered in the United States (U.S. Map New Stat, 2000). The data from this source indicates that more populous states top the list for largest number of domain name registrations. Though most web sites are inactive and do not conduct business online, surveys indicate that 80% of businesses that have registered a web address have done so to develop an on-line presence for an existing business (Network Solutions, 2000). In other words, these are companies with established business models and real products – the so-called “Click/Brick and Mortar” companies. These companies have nonetheless become the driving force behind the Internet economy, using the efficiencies and reach of the Internet to extend their traditional business models. Also note that many companies have distribution centers that would initialize shipment flows throughout the United States although they have registered their web site in another state. That is, a company's products may not come from the location where its domain is registered. The domain distribution data is not incorporated into the model, but it has provided us with better insights on the intensity of e-commerce in all the states in United States.

 

 

 

 

 

 

 

4. METHODOLOGY

4.1       Introduction

In this section, we first provide a brief overview of the formulation process of our gravity model that captures the directional distribution of flows. We explain the formulation procedure in a step-by-step manner. In Section 4.2, we describe the reverse derivation procedure. We use reverse derivation to determine the historical impedance of the base flow condition that leads to the flow distribution of the base flow condition. The deterrence function formulation will also be discussed in this section. In section 4.3, we describe the iterative procedure we use to adjust the calculated commodity flows to within 10 percent of the originally specified values. In section 4.4, we present the concept of an ‘extreme case in impedance values, and show how the growing e-commerce economy is moving the impedance values to this extreme. In section 4.5, we describe how to calculate the average mile statistic. The average mile is the average distance traveled by each ton of product. We project the increase in average mile due to e-commerce such that the average exponent n can be estimated. In Section 4.6, we describe the process of determining the appropriate smoothing constant l (a value to set the intermediate condition between current and future estimated condition). Finally in Section 4.7, we describe how the distributed flows are assigned to different modes of transportation.

 

 

 

 

We explain the steps of the procedure in more detail below:

1)      Determine the geographic regions for the model. We use the 48 contiguous states.