Restricted cubic splines in regression

Earlier this year, Angela O’Brien-Malone and I were working on some research that involved quantile regression using restricted cubic splines. Almost without exception, the papers that I read on cubic splines cited a paper by Stone and Koo published in 1985 in the Statistical Computing Section of the Proceedings of the American Statistical Association. Clearly, the authors of the papers that I read had better library resources than I, or perhaps they did not actually read the original paper and merely cited secondary sources! Despite contacting several libraries, I found myself completely unable to obtain a copy of the paper.

Eventually I had the idea of looking to see whether I could contact the authors and by good fortune found the email address for Charles J. Stone, Professor Emeritus of Statistics at the University of California (Berkley). By even greater fortune, Professor Stone had a copy of the paper which he kindly scanned and emailed to me. Now I, like many other readers of mathematics, like to see it beautifully typeset but in 1985 when Stone and Koo’s paper was originally published, the Statistical Computing Section of the ASA was using fonts that did the mathematics little justice. So, with Professor Stone’s permission, and as a way of saying “thank you”, I have reprintted the paper using LaTex. The images are taken directly from a scanned copy of the original.

Two versions of the paper are available for download here: For North American readers, there is a copy fitted to letter-size paper. For others, there is an A4 sized copy. The paper should be cited as:
Stone, C. J., & Koo, C.-Y. (Cha-Yong) (1985). Additive splines in statistics. Proceedings of the Statistical Computing Section, American Statistical Association 27, 45-48.

I should add that I have been very remiss in taking so long to make the paper available. I shall post an excuse at a later date …

Scoring the Values in Action Inventory of Strengths for Youth (VIA-Youth)

The Values in Action Inventory for Youth (VIA-Youth) is a 198 item self-report questionnaire that is very similar to the better known VIA Survey of Character. It was developed by Nansoon Park and Christopher Peterson who first described it, I think, in a paper [3] in the Journal of Adolescence. The VIA-Youth measures 24 so-called ‘character strengths’ [4] organized under six broad ‘virtues’ and is intended for use with young people aged 10–17 years.

The journal paper [3] explains that the authors, ‘ with different item formats and phrasings before arriving at the current inventory, which contains 198 items (7–9 items for each of the 24 strengths, placed in a nonsystematic order), about one-third of which are reverse-scored. … Respondents use a 5-point scale to indicate whether the item is “very much like me” (=5) or “not like me at all” (=1). Subscale scores are formed by averaging the relevant items.’ Unfortunately the explanation of the scoring ends there.

I had not heard of the VIA-Youth until yesterday when I was asked, by someone who had seen my earlier blog posting [2] on the VIA Survey of Character, whether I knew anything about the VIA-Youth scale. I didn’t, but found, amongst other copies, a Master of Science dissertation [1] that contains a copy of the VIA-Youth. With a knowledge of the 24 character strengths, it is not difficult to infer the scoring key.

There are some differences from the way that the adult scale is scored. Items 1–168 are in seven repeating blocks of 24 questions. Within the 24 item blocks, the Character Strengths are in the same order. However, the items from 169–198 are different. The next block would normally be from 169–192, but in fact it is from 169–191 because the item for the character-strength of “Forgiveness” has been omitted. The remaining 7 questions (192–198) are for the character strengths called Fairness, Humour, Perseverance, Kindness, Love, Humility and Self Regulation.

The other difference between the Youth scale and the adult scale is that all the adult items are scored the same way. However, as Park and Peterson commented in their paper, some of the youth items are scored in the reverse direction. I have provided links to two spreadsheets that describe the scoring key completely. The first spreadsheet is in Open Document format [ODS-link], the other in Microsoft Excel format [XLS-link]. I have indicated in the spreadsheet whether the item is scored in the + direction or the – direction. More clearly, I have indicated whether the responses should be scored 5..1 (meaning that “Very Much Like Me” is 5, and “Not Like Me At all” is 1), or 1..5 (meaning that “Very Much Like Me” is 1, and “Not Like Me At all” is 5).

I expect that having the scoring key readily available will promote greater empirical examination of the scale.

References

[1] Dieckman, D. (2009). Locker Room To Life: Do Sports Build Character? Dissertation for the degree of Master of Science in Guidance and Counseling. University of Wisconsin-Stout. [link]

[2] Diamond, M., O’Brien-Malone, A., & Woodworth, R. J. (2010). Scoring the VIA Survey of Character. Psychological Reports, 107(4), 833-836. DOI: 10.2466/02.07.09.PR0.107.6.833-836

[3] Park, N., & Peterson, C. (2006). Moral competence and character strengths among adolescents: The development and validation of the Values in Action Inventory of Strengths for Youth. Journal of Adolescence, 29(6), 891–909. DOI: 10.1016/j.adolescence.2006.04.011

[4] Petersen, C., & Seligman, M. (2004). Character Strengths and Virtues: A Handbook and Classification. Oxford: Oxford University Press.

Targeting Skills Needs in Regions

Tools of trade

Targeting Skills Needs in Regions was the name given by the Australian Government to a program that was intended to alleviate a critical shortage of skilled labour in some regions of Australia.

A report on the evaluation of the program was recently published by the Australian Government Department of Education, Employment and Workplace Relations. A copy of the report can be downloaded from here. The copy was obtained from http://www.deewr.gov.au/Skills/Resources/Publications/Pages/RegionalSkills.aspx on 21 December 2011 at 02:31 hrs.

Multidose vials of influenza vaccine: views on risk

Australian Doctor recently published a discussion article on the use of multidose vials for influenza vaccine. The discussion follows on from the publication in the Medical Journal of Australia of an article by Angela O’Brien-Malone and me regarding some of the medico-legal aspects of the use of the vials. The Australian Doctor article is particularly interesting because it presents a several competing points of view, with some practitioners arguing for the use of the vials, and others against.

A PDF copy of the article is available here.

Preventing a line hang-up: how often must you speak?

Consider the following problem. You have made a connection to a machine to which you want to transfer some data. You know that if you are silent for some finite, but unknown time, then the machine will hang-up on you. To keep the line open, you need to ‘ping’, or speak to, the machine at the other end. How often do you need to speak?

How can I force my ISP to give me a new IP address?

An alternative way of thinking about the problem is in terms of the lease time for an IP address that has been assigned by a DHCP server. Unless you have been assigned a permanent IP address by your internet service provider (ISP), then it is possible that each time you connect to the network, the DHCP server will assign you a new IP address. Typically, however, short disconnections from the network followed by reconnection will not result in a change of the IP address. In particular, if you reconnect to the network within the so-called ‘lease time’ of the IP address, then the address will be unchanged. If you reconnect after the expiry of the lease then the IP address that you are assigned from the pool of available addresses is likely to be (but is not guaranteed to be) different from the previously assigned address. How do you discover the lease time that the DHCP server uses? It’s a question that clearly interests a great many users although their question is usually phrased as ‘How do I change my IP address?’, or ‘How do I force my ISP to give me a new IP address?’

Phrased that way, the answer is, of course, that you can’t force the ISP. What you can do is to disconnect from your ISP for a duration that exceeds the the DHCP lease time of the IP address. But that begs the original question.

Analysis

I have not seen any literature related to the problem and I was surprised to learn, from my own analysis, just how time-expensive it is to pinpoint the lease time with any accuracy. I considered the following two, mutually exclusive possibilities: (a) I have a certain (as in sure) upper-bound for the lease-time, or (b) I have no idea of the lease time. If one were only reasonably sure, for example, that the lease-time is a couple of hours but were nonetheless absolutely certain that it could not exceed a year, then the situation would still meet the criterion for Scenario-a. I also make an important assumption that might, in fact, not be strictly true of the DHCP protocol. Specifically, I assume that by reconnecting to your ISP, the timer for the line will be reset so that the amount of lease-time that has expired on reconnection is zero.

Binary search strategy

The strategy that I examined relating to each of the two possibile scenarios was essentially one of binary search. However, it is a kind of binary search quite unlike those that are normally described for solving problems such as finding an entry in a database. What I used was an approach that seems intuitively obvious although I have no proof of its optimality.

Imagine that you know for certain that the lease time is less than a week. You might then try disconnecting from your ISP for half a week and, if your IP number remains unchanged, try disconnecting for three-quarters of week. If that then proves to be successful in changing the IP address, you might try disconnecting for 5/8 of a week, and so forth. The difference between this strategy and a normal binary search on a random-access machine is that ‘time’ does not have a random-access feature! Even worse, it does not have a rewind feature as a tape drive would. To improve on your upper-bound for the lease time of three-quarters of a week, you cannot simply rewind by 1/8 of a week to test the effectiveness of a 5/8-week disconnection. Instead you have to restart from time zero.

Experiments

For any given lease time that is less than the lease-time’s upper-bound, there is a specific total time—the discovery time— that it will take you to discover that that time is indeed the lease time. For example, assume that you know that 1024 time-units provides an upper bound on the lease time; that is, you know that the actual lease time is strictly less than 1024 time-units. Imagine also that the actual (but unknown) lease time is 73 time-units; in other words, 73 is the number that you are wanting to discover. The discovery process would proceed as an iteration of: (a) connection, (b) waiting, (c) checking whether a new IP number has been assigned, (d) increasing or decreasing the waiting time appropriately, and (e) disconnection. The waiting time (discovery) sequence would be: 512 units (half the upper bound), 256, 128, 64, 64 + 32 = 96, 64 + 16 = 80, 64 + 8 = 72, 64 + 8 + 4 = 76, 64 + 8 + 2 = 73, 64 + 8 + 1 = 73. The total time spent on the discovery process would then be 1431 time units … far greater than I would have guessed prior to doing the analysis.

I determined what the total discovery time would be for each (integer) lease time between 1 time-unit and 1023 time-units on the assumption that (Scenario 1) I knew that 1024 was an upper bound for the lease time, or (Scenario 2) I had no information regarding an upper bound.

The red line in the graph above shows the relationship between lease-time and discovery-time for Scenario 1. There are several things to note. First, the discovery time increases monotonically, though not strictly monotonically, with lease-time. For example, it takes as long (1577 time units) to pin-point a lease-time of 103 units as a lease-time of 104 units. Second, one can see that the red line does not intersect the origin, indicating that there is some fixed (time) cost for the discovery process, even if the lease-time proves to be very short. In fact, using the strategy that I’ve described it will cost 1023 time units to discover that the lease time is 1 time unit when the only available prior information is that the lease-time upper bound is 1024 and, in general, costs 2n–1 time-units to discover that the lease time is of duration 1 when the only known upper bound is 2n.

The violet line in the graph above shows the relationship between lease-time and discovery-time when there the upper-bound on the lease-time is unknown. The strategy that is assumed here is to begin with a disconnection of 1 time-unit and if that is unsuccessful in producing a change in IP address, disconnecting for 2 time-units, and then 4 time-units, and so forth, until a new IP address is obtained. Once that has been achieved, one has succeeded in obtaining a previously unknown upper-bound on the lease-time and one proceeds with a binary search following the strategy described for the scenario where the upper bound is known. As one can see from the graph, the times are short at the low end, but grow fast. In fact, the time-complexity for the case where there is no known upper-bound is O(n log n), as shown by the plot of the function n log2n (blue curve).

Keywords

DHCP, lease time, binary search, IP address

Contributors

Mark R. Diamond