replication - Handling Updates of OSM Data

Replication servers provide regular updates of OSM data. This module provides helper functions to access the servers and download and apply updates.

Replication Server Class

class osmium.replication.server.ReplicationServer(url, diff_type='osc.gz')

Represents a server that publishes replication data. Replication change files allow to keep local OSM data up-to-date without downloading the full dataset again.

collect_diffs(start_id, max_size=1024)

Create a MergeInputReader and download diffs starting with sequence id start_id into it. max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If some data was downloaded, returns a namedtuple with three fields: id contains the sequence id of the last downloaded diff, reader contains the MergeInputReader with the data and newest is a sequence id of the most recent diff available.

Returns None if there was an error during download or no new data was available.

apply_diffs(handler, start_id, max_size=1024, simplify=True)

Download diffs starting with sequence id start_id, merge them together and then apply them to handler handler. max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

The function returns the sequence id of the last diff that was downloaded or None if the download failed completely.

apply_diffs_to_file(infile, outfile, start_id, max_size=1024, set_replication_header=True)

Download diffs starting with sequence id start_id, merge them with the data from the OSM file named infile and write the result into a file with the name outfile. The output file must not yet exist.

max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If set_replication_header is true then the URL of the replication server and the sequence id and timestamp of the last diff applied will be written into the writer. Note that this currently works only for the PBF format.

The function returns a tuple of last downloaded sequence id and newest available sequence id if new data has been written or None if no data was available or the download failed completely.

timestamp_to_sequence(timestamp, balanced_search=False)

Get the sequence number of the replication file that contains the given timestamp. The search algorithm is optimised for replication servers that publish updates in regular intervals. For servers with irregular change file publication dates ‘balanced_search` should be set to true so that a standard binary search for the sequence will be used. The default is good for all known OSM replication services.

get_state_info(seq=None)

Downloads and returns the state information for the given sequence. If the download is successful, a namedtuple with sequence and timestamp is returned, otherwise the function returns None.

get_diff_block(seq)

Downloads the diff with the given sequence number and returns it as a byte sequence. Throws a urllib.error.HTTPError (or urllib2.HTTPError in python2) if the file cannot be downloaded.

get_state_url(seq)

Returns the URL of the state.txt files for a given sequence id.

If seq is None the URL for the latest state info is returned, i.e. the state file in the root directory of the replication service.

get_diff_url(seq)

Returns the URL to the diff file for the given sequence id.