Utah FORGE Data Distribution

Access to FORGE's seismic data

About

This section of the website contains a miscellany of information, tricks, and tips we are acquiring as this component of the project moves forward.

What exactly is this website?

This website barely qualifies as a frontend. The heavy lifting happens behind the scenes where we rely on the CHPC's Pando utility. Effectively, this is a large, publicly accessible S3 bucket. Alternatively, you can think of Pando as a Linux file system from which you pull the file defined by a URL. For example, you could pull the following

            https://pando-rgw01.chpc.utah.edu/slb_2019_MW78-32_001/20190419234038.387.segy
        

The URL is broken out as follows:

Therefore, if you are feeling ambitious you will find it pretty straightforward to scrape the archive provided you have the disk space on your end to catch the results.

Why aren't there hyperlinks to the data?

A key difference between this site and the weather data distribution service HRRR is that we tend to have an overwhelming number of files. Since the process of clicking a link for each file is as error prone as it is tedious we provide fetch scripts which use wget. The scripts are just thousands of lines of things like

            wget -q https://pando-rgw01.chpc.utah.edu/slb_2019_MW78-32_001/20190419234038.387.segy
            wget -q https://pando-rgw01.chpc.utah.edu/slb_2019_MW78-32_001/20190419234734.387.segy
            wget -q https://pando-rgw01.chpc.utah.edu/slb_2019_MW78-32_001/20190419235550.387.segy
            .
            .
            .
        

If command lines aren't your thing or you don't have a system with wget then it is straightforward to lift the URLs and refactor the scripts to utilize any language or retrieval mechanism that you like.

Can I speed-up wget?

Understandably, you may be tempted to execute the script in parallel. This is okay provided the number of requesting threads is smaller and on the order of 10. When too many requests are simultaneously issued the hardware on our end will timeout. These timeouts result in a flurry of behind-the-scenes emails and can cost you a few days of download time while we sort things out.

Do you have a catalog?

Nope. This site is more interested in distributing seismic data not products. You can try the Geothermal Data Resource to search for accompanying information. The regional background seismicity is available from the UUSS.