2017-12-19 - Future - Tony Finch
Here are some criticisms of the IP Register database schema and some thoughts on how we might change it.
There is a lot of infrastructure work to do before I am in a position to make changes - principally, porting from Oracle to PostgreSQL, and developing a test suite so I can make changes with confidence.
Still, it's worth writing down my thoughts so far, so colleagues can see what I have in mind, and so we have some concrete ideas to discuss.
I expect to add to this list as thoughts arise.
- Existing context on Jackdaw
- Coupling between IP Register and User-Admin
- DNS backend vs asset register
- LANs and subnets
- CNAMEs and
aname
s - TXT records
- IP addresses and subnets
- Anything else?
Existing context on Jackdaw
It's worth unpacking "Jackdaw" a bit.
There is Jackdaw the service, consisting of a number of Linux servers
that host an Oracle DBMS and Apache mod_perl
application framework.
In the Oracle DBMS there are a number of databases: a production
database that is also called jackdaw
, which is occasionally cloned
for operational reasons to make jdawdev
, jdawtest
, etc.
The original user of the Jackdaw database was the user administration system; the IP Register database is a separate schema within the same database.
Future: Better use of SQL schemas outside Jackdaw
The plan is to make the IP Register database its own self-contained system hosted on PostgreSQL.
Instead of multiple databases within a DBMS, there will be just one. The dev and test databases will be on separate Linux systems isolated from the production servers. (I'm currently working on a dev system using a cluster of VMs hosted on my workstation.)
Breaking away from the user-admin database makes it possible to use multiple schemas within the IP Register database. There's an obvious place where this can make a big improvement.
At the moment, the Perl IP Register front-ends pass around a $prefix
which privileged users can set to choose whether they are working at
their normal privilege level (my_
) or in read-all mode (ra_
) or in
access-all-areas mode (all_
). This means that most of the SQL
queries are full of string interpolations, which means the code is not
trivially obviously free of SQL injection bugs.
If each set of access control views is moved to its own schema, then
the database front end can can set PostgreSQL's schema search path to
choose between my
, ra
, and all
schemas, and use static SQL
instead of interpolating $prefix
.
Further future: row-level security
Dean Rasheed tells me I should look at row-level security as a better access control mechanism than IP Register's existing updatable views. In particular, it would allow us to give users read access to objects which they can't update.
I like the sound of this a lot, but it will require a lot more thought to work out how to make good use of it.
Coupling between IP Register and User-Admin
There are a few cases where the IP Register database makes use of the fact that it is just a schema within the wider Jackdaw database.
Firstly, the mzone_co
, areader
, and registrar
tables refer to
people by their CRSID; there are constraints tying these tables to the
pers
table in the user-admin database. I will have more to say about
this below.
Secondly, the Gossamer system which handles the provisioning of Falcon web sites (and used to handle MWS sites) has feet in both camps: the authoritative details of the web sites are held in the user-admin database, and the DNS-specific information is copied to the IP Register database. The current provisioning scripts take advantage of the fact that these are actually the same database, and use transactions that span both user-admin and IP Register databases when provisioning sites.
Those are the only cases I am currently aware of...
Future: ACLs from Lookup
There's a lot of unnecessary bureaucracy maintaining the IP Register access control lists.
Instead of manual maintenance by UIS staff, the access control lists should come from Lookup groups via LDAP, so each institution can maintain their own access list in the same way as they maintain other groups.
Currently I think that it should require a quick manual process for an
existing mzone_co
to mirror a new ACL from Lookup to the IP Register
database. This is because control over the DNS gives you huge amounts
of privilege, and I don't want a compromise of Lookup to automatically
lead to compromise of the DNS.
Future: two factor authentication
We currently have a rough prototype implementation of TOTP for registrar access to the IP Register database.
This should be made available as an option for all mzone_co
s.
The TOTP secrets need to be added to the database, with a mechanism allowing IT staff within an institution to reset each others TOTP secrets wihtout having to talk to the UIS.
Future: Gossamer, MWS, etc.
There's no real need or benefit from Gossamer using transactions that span the user-admin and IP Register databases - all the other provisioning systems that hang off Jackdaw are loosely coupled and eventually consistent, and this is true for the other web provisioning side of Gossamer.
We have a number of special interfaces to IP Register that (like Gossamer) need special privilege: the MWS3 API; the IP filter interlock; the RPZ pruner; ... These should all be refactored to follow a common set of design principles to be determined.
DNS backend vs asset register
In every-day usage the IP Register database falls awkwardly between two stools: it isn't just a DNS backend, because each entry in the database requires certain metadata which isn't exported to the DNS; but the schema for this metadata is frequently too impoverished to be helpful for much of the information we want to record about DNS registrations.
As far as I can tell, this aspect of the IP Register database came from the manual / semi-automated practices of the IP Register team in the 1990s, and it has not been re-worked since then.
Future: arbitrary metadata
The PostgreSQL jsonb
data type allows us to attach schemaless data
to a row. I would like to replace all the IP Register metadata fields
that aren't used by the database with a jsonb
column.
All tables have a remarks
column, which is the basic metadata to be
replaced by jsonb
; other fields to fold into it include purpose
,
equipment
, location
, sysadmin
, etc. usw.
Questions of what metadata to ask for then become a matter for the user interface.
Future: box
vs vbox
The differences between boxes and vboxes are:
boxes and vboxes have different un-constrained metadata fields;
vboxes are hosted on some other box or vbox, as recorded in the
vbox_box
table.
After the jsonb
metadata change, the first difference goes away.
The second difference also becomes moot, because a box is equivalent
to a vbox without an entry in the vbox_box
table.
So this distinction should be eliminated.
LANs and subnets
Another hangover from the 1990s is the IP Register representation of LANs and subnets. It seems to be designed to make it convenient, when registering a box, to provide manual configuration details (router, subnet mask, name servers) to the admin of the box.
In practice an IP Register LAN typically corresponds to a VLAN as configured into the relevant network equipment - but the database does not record VLAN numbers. It seems to be a half-finished feature.
Future: DHCP support
The IP Register database currently has bare minimum support for exporting static registrations to our DHCP servers: you can tie an IP address to a MAC address and a DHCP group, and that's it.
I would like to be able to represent the rest of our DHCP configuration in the database too: DHCP-enabled subnets, dynamic pools, etc.
I would like an IP Register LAN to correspond to a VLAN from the
network equipment point of view and a shared-network
from the DHCP
point of view, with a flag determining whether it is exported to the
DHCP servers. We should also be able to match up the contents of the
IP Register database to the router configurations more automatically.
The LAN rubric stuff then becomes useful as DHCP options.
I would also like to be able to represent DHCP pools in the database. In this area we need to be thinking about replacing old ISC DHCP with ISC Kea - the latter is database-backed which might or might not mean we will benefit from tying it directly to the IP Register database.
IPv6
There are a couple of annoying limitations with the way the IP Register database models IPv6 subnets.
It requires the subnet prefix length to be 64 bits. We generally use long prefixes for link subnets, so we want to be able to represent them as such in the database.
We have a convention for IPv6 service identifiers, using addresses of the form
prefix::service:instance
where prefix is 64 bits, and service and instance are 16 bits. We currently use a wiki page for keeping track of service identifiers; this should be in the database.In IPv4, LAN names are associated with
v4_address
objects, so a subnet can be split between multiple logical LANs, and LANs can be updated bymzone_co
s. In IPv6, the LAN is associated with thev6_prefix
and they require privileged access to update. This should be delegated.Ideally it should be possible to associate a LAN with a pair of a
v6_prefix
and a service identifier, so that static IPv6 addresses can be allocated automatically in a sensible way.
CNAMEs and aname
s
The only delegated mechanism we have for making DNS aliases is with CNAMEs.
Any interesting alias setups have to be done with aname
objects,
which can only be created by privileged IP Register users. This
includes things like service aliases that combine multiple vboxes
(which we use for the ppsw
and recdns
services), bare www-less web
site names, and aliases for off-site services.
We also use aname
s as interlocks, to prevent records from being
deleted when there is some network configuration that refers to them.
(This is the ipfilter
mechanism.)
Future: CNAME and aname
refinement
CNAME quotas should be abolished.
We should have an alias mechanism that is published in the DNS as the
addresses of the target, to avoid the restrictions on CNAME records.
We have something like this for domains hosted on the MZS; in the
future it should be based on
standard ANAME records
- which are not the same thing as IP Register aname
s.
The IP Register aname
table should be renamed to avoid confusion
with standard ANAMEs. (Renamed to what, though?)
I would like to delegate the ability to create IP Register aname
s,
though this may hit a snag with the current mutable view acess control
mechanism.
Future: unrestricted aliases
Our process for setting up off-site servers is currently very bureaucratic. To a large extent this is because it requires awkward work-arounds for limitations in the IP Register database.
To fix this, we should allow staff to set up arbitrary CNAMEs, arbitrary standard ANAMEs, and arbitrary address records.
Future: cross-mzone objects
This is a knotty area, and I'm not entirely sure what the right approach is.
At the moment, staff who have access to multiple mzone
s can create
objects that span mzone
s without restriction - e.g. a name from one
mzone
pointing at an address in another mzone
.
I'm inclined to think this is a bad idea, because it has awkward edge
cases where not all staff in a group have access to the same mzone
s,
and when mzone
s split or merge.
Instead, ordinary users should only be able to create objects that are
entirely within an mzone
. If they need to create mzone-spanning
links they can use unrestricted aliases.
Privileged cross-mzone objects might still be used for ipfilter
interlocks, and maybe for Falcon / MWS aliases.
TXT records
A large proportion of the non-database records in the cam.ac.uk
zone
are TXT records. These fall into roughly three areas:
SPF records, which are managed by a special-purpose script;
DKIM / DMARC records;
Microsoft / Google / etc. domain authentication records.
Future: SPF
SPF records are closely tied to mail domains. They should be moved
into the database, hooked into the maildom
table similarly to the
mx
table, to replace the current ad-hoc configuration file.
Future: other TXT
Staff should be able to create arbitrary TXT records, which would cover the other cases listed above. I'm not sure if it's worth implementing special cases for DMARC and DKIM - maybe syntax checks in the user interface will be enough.
IP addresses and subnets
The current IP Register database has a number of subroutines for converting IP addresses between packed octet strings (for storage in Oracle) and textual form (for presentation to users and export to the DNS).
Future: ip4r
The ip4r
PostgreSQL extension provides native data types for IP
addresses and subnets. Using this would allow us to eliminate a lot
of awkward custom code.
Anything else?
So far this is just one person's thoughts, and I have a fairly limited and unusual perspective on the database. Because of this I'm very keen to hear other people's ideas and feedback: Do you like what I wrote above? Did I miss anything out? Do you have any suggestions?
Please send email to ip-register@uis.cam.ac.uk!