GFS Data portal
The data portal Global Forecast System (GFS) is here.
Scroll down to the Product Types/GFS Forecasts section of the webpage. And we are going to use GFS 003 (1º)-Domain. The 003 is a code for the grid used for covering the earth. There are finer choices.
The HTTPS and TDS data access links only have the most recent data, so for historical forecast data, we use AIRS.

Submit request to AIRS

It seems GFS does forecast 4 times a day at 0, 6, 12, 18. I don't think we actually need all 4 model cycles, but I download them all and probably average the predictions later. A single request has an upper limit of 250GB, and a month worth (4 cycles) of data is okay to be request at the same time. After choose the cycle(s) and data, input an email address, and press Proceed with Order. It will take a while to submit and process the order, so be patient.
If a request is submitted correctly, you can press Check order status button to see the progress of your order.

After the order is done, you can find the download link at Delievery Location. Here is the an example of the download page

Now we can download one tar file, for example gfs_3_2019010100.g2.tar, and save it to a directory. To extract the tar file, in the directory, run
$ tar -xf gfs_3_2019010100.g2.tar
Each tar file contains many files, one for each horizon.
For day 0 to day 9, GFS forecasts for every 3 hours, and for day 9 to day 15, every 12 hours. The file names last field is the prediction horizon in hours.
In the example below, we will use the file gfs_3_20190101_0000_384.grb2, so download here if you want to try out the code.
Naming rule of GFS forecast file
The GFS forecast is named as follows: gfs_[grid type]_[forecast-on date YYYYMMDD]_[cycle00]_[horizon].grb2 and cycle takes value in 00, 06, 12, 18.
The organization of a grb file
The GFS forecast grb file is organized as follows: it records locations and values for many variables. Each record of a grb files is a 3-tuple: (variable description, locations (a 2d matrix), values (a 2d matrix)).
Process grb file with python
We first have to install a package that can process grb file.
$ python3 -m pip install --user pygrib
The PyPi page: https://pypi.org/project/pygrib/
Usage example: https://jswhit.github.io/pygrib/api.html#example-usage
We first load packages pygrid and pandas. We use pandas to process the meta data of the grb file
import pandas as pd
import pygrib
Here we show how to save the variable descriptions to a more conventional dataframe.
Note: the columns here is specific to the grb file we are interested in.
fname = 'gfs_3_20190101_0000_384.grb2'
gr = pygrib.open(fname)
with open('meta.csv', 'w') as handle:
for i, g in enumerate(gr):
handle.write(str(g) + '\n')
df = pd.read_csv('meta.csv', sep=':', header=None, index_col=0)
df.columns = ['name', 'units', 'gridType', 'typeOfLevel', 'level', 'forecastTime', 'dataDate']
df.to_csv('meta.csv', index=False)
print(df.head())
Output is:
We can get the visibility forecast (record 1) for North America with the following code.
grb = grbs.message(1)
print(grb)
data, lats, lons = grb.data(lat1=20,lat2=70,lon1=220,lon2=320)
Output: 1:Visibility:m (instant):regular_ll:surface:level 0:fcst time 384 hrs:from 201901010000
Note: the lats and lons are both 2d matrix, like the ones you get by running numpy.meshgrid().
