2019-02-07 - Progress- Future - Tony Finch
I have spent the last week working on DNSSEC key rollover automation in BIND. Or rather, I have been doing some cleanup and prep work. With reference to the work I listed in the previous article...
Done
Stop BIND from generating SHA-1 DS and CDS records by default, per RFC 8624
Teach
dnssec-checkds
about CDS and CDNSKEY
Started
- Teach
superglue
to use CDS/CDNSKEY records, with similar logic todnssec-checkds
The "similar logic" is implemented in dnssec-dsfromkey
, so I don't
actually have to write the code more than once. I hope this will also
be useful for other people writing similar tools!
Some of my small cleanup patches have been merged into BIND. We are currently near the end of the 9.13 development cycle, so this work is going to remain out of tree for a while until after the 9.14 stable branch is created and the 9.15 development cycle starts.
Next
So now I need to get to grips with dnssec-coverage
and dnssec-keymgr
.
Simple safety interlocks
The purpose of the dnssec-checkds
improvements is so that it can be
used as a safety check.
During a KSK rollover, there are one or two points when the DS records in the parent need to be updated. The rollover must not continue until this update has been confirmed, or the delegation can be broken.
I am using CDS and CDNSKEY records as the signal from the key
management and zone signing machinery for when DS records need to
change. (There's a shell-style API in dnssec-dsfromkey -p
, but that
is implemented by just reading these sync records, not by looking into
the guts of the key management data.) I am going to call them "sync
records" so I don't have to keep writing "CDS/CDNSKEY"; "sync" is also
the keyword used by dnssec-settime
for controlling these records.
Key timing in BIND
The dnssec-keygen
and dnssec-settime
commands (which are used by
dnssec-keymgr
) schedule when changes to a key will happen.
There are parameters related to adding a key: when it is published in the zone, when it becomes actively used for signing, etc. And there are parameters related to removing a key: when it becomes inactive for signing, when it is deleted from the zone.
There are also timing parameters for publishing and deleting sync records. These sync times are the only timing parameters that say when we must update the delegation.
What can break?
The point of the safety interlock is to prevent any breaking key changes from being scheduled until after a delegation change has been confirmed. So what key timing events need to be forbidden from being scheduled after a sync timing event?
Events related to removing a key are particularly dangerous. There are some cases where it is OK to remove a key prematurely, if the DS record change is also about removing that key, and there is another working key and DS record throughout. But it seems simpler and safer to forbid all removal-related events from being scheduled after a sync event.
However, events related to adding a key can also lead to nonsense. If we blindly schedule creation of new keys in advance, without verifying that they are also being properly removed, then the zone can accumulate a ridiculous number of DNSKEY records. This has been observed in the wild surprisingly frequently.
A simple rule
There must be no KSK changes of any kind scheduled after the next sync event.
This rule applies regardless of the flavour of rollover (double DS, double KSK, algorithm rollover, etc.)
Applying this rule to BIND
Whereas for ZSKs, dnssec-coverage
ensures rollovers are planned for
some fixed period into the future, for KSKs, it must check correctness
up to the next sync event, then ensure nothing will occur after that point.
In dnssec-keymgr
, the logic should be:
If the current time is before the next sync event, ensure there is key coverage until that time and no further.
If the current time is after all KSK events, use
dnssec-checkds
to verify the delegation is in sync.If
dnssec-checkds
reports an inconsistency and we are within some sync interval dictated by the rollover policy, do nothing while we wait for the delegation update automation to work.If
dnssec-checkds
reports an inconsistency and the sync interval has passed, report an error because operator intervention is required to fix the failed automation.If
dnssec-checkds
reports everything is in sync, schedule keys up to the next sync event. The timing needs to be relative to this point in time, since any delegation update delays can make it unsafe to schedule relative to the last sync event.
Caveat
At the moment I am still not familiar with the internals of
dnssec-coverage
and dnssec-keymgr
so there's a risk that I might
have to re-think these plans. But I expect this simple safety rule
will be a solid anchor that can be applied to most DNSSEC key
management scenarios. (However I have not thought hard enough about
recovery from breakage or compromise.)