Time-Series Data

Hydstra

Water Management

What is timeseries data?

Time-series data is a series of values of some environmental or control variable which is measured repeatedly over time. Data may be measured regularly or irregularly at intervals ranging from one second upwards. To store data in HYDSTRA/TS one dimension of the data must be time, and the other a single-dimensional value like level or temperature. For example you cannot store a well-log in HYDSTRA/TS, as the log's main changing dimension is not time but distance down the hole.

What types of time-series data can I store in HYDSTRA files?

You can store any level or counter type data which varies with time. Examples include water level, flow rate, velocity, turbidity, temperature, wind speed, rainfall, wind direction, flow volume, battery voltage, gate opening, pump status, etc.

How accurately can I register times in HYDSTRA data?

HYDSTRA data is stored to the nearest second.

How many different variables can I store for one site?

There is no limit to the number of different variables you can store. For example, in a single file for a station you can store water level, rainfall, temperature, conductivity, battery voltage and anything else you like.

Can I store daily data and logger data for the same parameter?

Yes, using the concept of subvariables you can store up to 99 recordings of the same parameter at the same site. For example you can store different time samplings (daily, hourly, 15 minute), different locations (upstream level, downstream level), different technologies (logger data, chart data), different depths (wind speed at 5m, wind speed at 20m), etc and distinguish between them using subvariables.

How many stations can I store data for?

There is no inbuilt limit, and we have users with many thousands of stations operating well. The number of files in a directory is limited on some networks to 4096, which may limit the number of stations to 2048. The number of files in a directory may be limited to 4096 on some CD formats. We would suggest that you contact us before planning to store more than 10,000 stations, as some processes may become slow at that scale without further tuning.

What period of data does HYDSTRA keep on-line?

HYDSTRA is designed to keep all your data online for ever. Even with very long archives for thousands of stations HYDSTRA should be able to retrieve and analyse a year of continuous data in a few seconds.

Can I store irregularly spaced data in HYDSTRA?

Yes, the minimum spacing between two points is one second, and there is no practical maximum. HYDSTRA can handle regular and irregularly spaced data (for example 15 minute water levels and event-based rainfall tips). You can mix timings in the one file, and even change styles within a variable (for example changing from 15 minute rainfall totals to event-based rainfall tips when you upgrade a logger).

Can I analyse and report my data at different intervals to what is stored?

Yes. HYDSTRA can report data at any interval you specify, and will interpolate values if necessary. For example you can digitise a water level trace at random spacings, but then extract or report hourly or daily flows.

How does HYDSTRA interpolate between data points?

HYDSTRA associates a data transformation code (datatrans code) with the data to decide how to interpolate between the data points. For datatrans 1 data such as water level HYDSTRA always interpolates in a straight line between time-series data points. Hence you should sample your data frequently enough for this to be accurate. Read up on datatrans to learn more about how HYDSTRA interpolates other datatrans data such as means and totals in a period, and event data.

How does HYDSTRA calculate mean values over a period?

HYDSTRA uses trapezoidal integration to calculate mean values. For level-type variables (water level, temperature, etc) points are interpolated if necessary at the start and end of the interval, and they are used in conjunction with all other points within the interval to compute the area under the curve. The area is then divided by the time interval to compute the mean.

If there are n points at times T1 to Tn with values D1 to Dn then the mean value is computed as:
(Sum(i=1..n-1) 0.5*(Di+1+Di)*(Ti+1-Ti))/(Tn-T1).

In what format does HYDSTRA store time-series data?

HYDSTRA uses a proprietary index-sequential file structure which minimises file size while providing fast access to the data. All the data for a site is kept in a single file, so there is one archive file per site. An index file is used to provide rapid access to parts of the data file. Users never need to know the internal details of the file structure as a variety of tools are available for quickly putting the data in and getting it out again.

How does HYDSTRA compress data?

HYDSTRA uses a variety of techniques to reduce the size of the data. The most important one is to specify key information like site and variable only as it changes, rather than on every data point. Other techniques include reducing the number of bytes required to store a number, and removing redundant data points which lie on or near straight lines.

Compression can be turned off for particular stations or variables.

Is the data compression lossy?

Yes, but you control how lossy. For example if you specify that water level is to preserved to the nearest mm then data may be rounded to the nearest mm but it will never be more than half a mm from the original value. By choosing sensible limits for each parameter you can exert control over how hard your data is compressed. If you want to be very conservative then choose a very small PRECISION value in the VARIABLE database for each variable. Your files will be larger, but more accuracy will be preserved. However it is not sensible to store water level data to a millionth of a metre when the field recording technologies are incapable of recording much better than a few mm.

Does the data compression discard recorded points?

Yes, for some data. If data points lie on or near straight lines, the compression can discard them, since applications can recover that value at that time using interpolation.

For some data, I don’t want any points to be discarded. Can I stop it?

You can, using a setting in HYCONFIG.INI and/or entries in the VARSUB database. These lets you control the default compression setting, and then give finer control for certain types of data. For example, you can configure HYDSTRA to store all daily readings, even if they lie on a perfectly straight line. This can be useful if the fact that a daily reading was taken or not taken is important information, in addition to the actual value at that time. You would do this by storing such data with a special subvariable. See Variables and Variable Conversions FAQ for more on subvariables.

How much space will my data occupy in HYDSTRA?

It depends very much on how rapidly the data changes, what precision it is being stored to, how dense the original data is, and how many comments are added to the data. However typical water level data should be in the range of 2 to 3 bytes per sample, and compression will remove points on straight lines completely - this is particularly effective in compressing long runs of zero rainfall.

With larger and larger disk drives available, why does HYDSTRA need to compress time-series data?

If you store time-series data uncompressed in a relational database like Oracle you will need at least 40 bytes per sample, or at least 10 times more space than HYDSTRA, possibly 20. Many hydrological analyses require that you analyse the whole period of record (flow duration, low flow, flood frequency, monthly summaries, etc). This means that you will regularly be retrieving very large files across your network, which will lead to high server and network loads.

As an example, sample water level data is delivered with HYDSTRA for station HYDSYS01, a real station in the ACT. The station has 25 years of continuous water level data, a mixture of digitised charts and logger data. In HYDSTRA the data consists of 84,259 data points, and has a total size of 213,000 bytes, some 2.5 bytes per point. If the data were stored in normalised Oracle tables it would occupy about 8.5Mb, which would have to be retrieved by the server and brought across the network every time it was all analysed. Even worse, if the data was stored at fixed 15 minute intervals it would expand to 876,000 points, which would occupy over 35Mb. No matter how large the disk and how fast the network, compressed data will always be faster to access and use.

What units does HYDSTRA use to store data?

You specify what units the data is stored in. For example you can choose to store water level in feet, cm, mm or meters, or indeed anything else, it is up to you. However we do recommend that once you choose a set of units for a particular variable you stick to it. It is not advisable to store some level data in feet and some in meters. Most import routines provide the facility to rescale data as part of the import, so even if a logger is recording in mm you can store it in meters or feet if you need to.

Can HYDSTRA handle gaps in data?

Yes, gaps are allowed. You can also use the quality code system of HYDSTRA to indicate that data is of bad quality and should not be used in computations. However gaps may prevent many programs from producing meaningful output, so it is usually best to try and estimate values for missing data, and then quality code the data as estimated.

Can HYDSTRA handle overlaps in data?

Data cannot be archived with overlaps. If data legitimately generates overlaps (for example by repeated unloading a logger from the time it started, without resetting the logger each time) then archiving tools are available to add only the new data to the archive.

Can I easily analyse data across year boundaries?

HYDSTRA stores all data for all years in a single file. You can extract, report and analyse the data for any period, including calendar year, water year or fiscal year.

Can I easily edit my time-series data?

Yes, HYDSTRA provides the Data Managers Workbench, a powerful graphical tool for editing and annotating time-series data. The workbench can draw, smooth, quality code, calibrate, adjust timing, adjust for sensor drift, correct rainfall to gauge and perform most of the data management tasks you will routinely need. Changes are quality coded, and you have the option to add comments at any time. While you are working you have a copy of the original data to see what you have changed. When you are happy you can commit the changes permanently.

Isn't it dangerous allowing such easy data editing?

Not if the users are trained properly. Unlike banking, commercial or personnel data, most time-series environmental data is at least partly wrong when it is collected, and patently so. Transducers drift, clocks run slow, orifices silt up, algae grow on sensors, batteries go flat, pressure transducers are affected by temperature, charts distort, ink runs, electronics generate spikes, equipment is set up incorrectly, the list goes on and on. It is vital that what is stored in the archive is as close as possible a representation of what really happened, and not immutable evidence of failure. In many cases the data is improved by appropriate correction and editing, and when it can't be improved it can be flagged so that it is not used for decision making.

Can I control who can edit data?

Yes, you can determine for each registered HYDSTRA user whether they have edit permission or not, and whether they can archive their changes.

Can I indicate how reliable the data is?

Yes, HYDSTRA gives you a range of quality codes (1-254) which you can use to indicate how good the data is. We deliver a suggested set of codes, though you can change the if you wish.

Can I annotate the times-series data with comments?

Yes, each data point may have any number of comments, each of which may be up to 80 characters long. Comments are inserted automatically by most import programs indicating when and where the original data came from, and it is good practice to add comments whenever data has been manually edited, to notify future users of what has been done.

Does HYDSTRA keep the raw data whenever you edit a time-series trace?

We keep only the latest version of the data as a HYDSTRA file. When you commit a set of edits to the archive, the previous version of the data is lost. However we regard the real raw data as the data prior to its being imported into HYDSTRA, as the import process manipulates the data in a variety of ways (including data compression) so that it is really no longer raw data. HYDSTRA offers mechanisms to preserve the raw text files (from loggers or chart digitising), and this is always kept.

How do I store flow?

You can store computed flow in the time-series file, if you wish. However HYDSTRA is fast enough to re-compute flow from the ratings every time you need it, so normally users do not store flow.

How do I store pump status?

Typically we store status variables like pump status, which are either on or off, as a time-series trace which is set to zero for off and 1 for on. You can then use the variable conversion system to compute pump run-time, pumped volume and duty cycle.

I have many years of daily flows but no stage or ratings - how do I handle it?

HYDSTRA has a Flow Archive system which will respect the historic flows but recompute new flows as new stage record is added. The flow archive program HYFLARCH recomputes modern flows whenever a rating or the time-series data has changed, but preserves historic flows and estimates.

How do I get time-series data into HYDSTRA?

As part of the installation we will probably have left you with some standard import techniques. Basically a few major programs will be used most of the time:

HYCREATE is good for bulk loading of data, particularly if you have several stations in the one file

The Generalised Loggers system is recommended for routine logger processing, particularly when a data file has been unloaded.

HYCSVIN is good for bringing in data from spreadsheets

HYDIGI can be used to digitise charts

A number of special purpose programs can import data from specific sources, including Western Hydro, ADS loggers, Sigma loggers, Mace loggers, Unidata loggers, Hydrological Services RRDL3 loggers, etc.

How can I reformat my input data files into a form that HYDSTRA can read?

HYDSTRA can read many logger file formats exactly as-is, as they come off the logger. If the files need reformatting we use the programming language Perl to reformat the data as part of the import process. Perl is provided with HYDSTRA.

How do I get data out of HYDSTRA again?

A number of programs can be used to export data in various text formats suitable for linking to models, spreadsheets, etc:

HYEXTR will extract data either as stored or at regular intervals, one point per line

HYCSV will extract data for up to 10 parameters per line in comma separated form suitable for importing into spreadsheets

HYDDE provides a real-time link you can call from other programs like Excel, Visual Basic, etc.

How can I reformat output data to suit my models?

We recommend using Perl to format data from HYEXTR or HYCSV to specific output formats. HYDSTRA Systems can develop programs to produce specific outputs for you as a consulting task.

Will HYDSTRA Systems help me load my existing data into HYDSTRA?

Yes, the normal procedure is for HYDSTRA Systems to spend time with you on-site when you purchase HYDSTRA. At that time they will help you convert your existing data and develop procedures to handle all your routine data processing requirements. HYDSTRA Systems can help you at any time with data conversion and import at our normal consulting rates. Through subcontractors we can even arrange for chart backlogs to be digitised.