Smugglers isomorphism

A colourful colletion of tablets of the drug popularly known as Ecstacy. Photo:

A colourful collection of tablets of the drug popularly known as Ecstacy. Photo:

Some years ago, Daniel Reidpath and I were experimenting with the idea of using a self-guided learning algorithm to develop rules for discriminating the profiles of drug-smugglers from those of non-smugglers in a database of visitors to Australia. The criterion variable in which we were interested was dichotomous: coded 0 if a visitor was not a drug-smuggler, and coded 1 if they were. Similarly, the predictor variables we examined were also coded dichotomously. Had the visitor had a sojourn in the Golden Triangle on their way to Australia? Were they under 25 years old? Were they male? — and so forth. And we wanted to check our learning algorithm by testing it on a database of simulated cases in which the functional relationship between the predictor variables and the criterion variable was known, and systematically changed by us.

The problem with the simulations was that the number of functions that we might need to look at grows as a double-exponential function of the number of predictor variables. Specifically, if there is but one predictor variable, then there are four possible functions relating the predictor variable to the criterion variable. If there are 2 predictor variables, then there are 16 different functions; with 3 predictor variables the number of possible functional relationships has grown to 256; and, in general, for n predictor variables there are 22^n. However, many of those functions do not need to be tested separately because there is a great deal of redundancy amongst them, at least from the point of view of the learning algorithm. That is, one or more of the functions are essentially isomorphic. Consider, for example, the fact that whether we code the answer to the question, “Is the visitor male?” as 0 for Yes and 1 for No, or vice versa, is essentially irrelevant. Either coding system would work and we knew that if our learning algorithm were capable of detecting a relationship that was coded in one direction then it would be equally capable of detecting it if it had been coded in the opposite direction. The same logic applies to the criterion variable: the criterion could be coded as 1 for drug-smugglers and 0 for non-smugglers, or vice versa.

Two or more predictor variables

When one considers two or more predictor variables, rather than just a single one, as in the case of the sex of the visitor, the redundancy is even greater. It was irrelevant to us whether, for example, we treated the sex of the visitor as the first predictor variable and their (dichotomized) age as the second variable, or the other way around. When all the redundancies are taken into account, the number of functional relationships against which we needed to test our learning algorithm droped considerably. Instead of 256 functional relationships with three predictor variables, only 14 relationships need to be examined. Similarly, in the case of four predictor variables, the number of functions that need examining is reduced from 65536 to 247. It appears that the number of functions to be examined still grows as a double-exponential of the number of predictor variables, but nonetheless for any n, the number is much less than what it would otherwise be.

If you’re interested in more details of the Smugglers isomorphism, you will find them here.

How many non-isomorphic functions are there under the Smugglers isomorphism

It would be nice to have an analytical solution (i.e., an explicit formula) for the number of non-isomorphic functions on n predictor variables.

Contributors: Mark R. Diamond