In recent decades, people's experience of welfare has undergone a dramatic transformation, with the responsibility for managing risk increasingly being shifted from state institutions to non-governmental agents, individuals and agencies. Some commentators see this shift as heralding a fundamental transformation of society, while others have pointed to the resilience of the In recent decades, people's experience of welfare has undergone a dramatic transformation, with the responsibility for managing risk increasingly being shifted from state institutions to non-governmental agents, individuals and agencies.
Some commentators see this shift as heralding a fundamental transformation of society, while others have pointed to the resilience of the welfare state. In the transformation of the welfare state, moral and ethical questions about collective responsibility for social and economic risks abound. In Risk, Welfare and Work , editors Greg Marston, Jeremy Moss and John Quiggin bring together contributors from diverse disciplines to explore these questions and examine shifting risk in historical and contemporary Australia—including implications for groups such as young people and Aboriginal Australians—and views of Britain and the United States.
Get A Copy. Paperback , pages. Published July 15th by Melbourne University Publishing. More Details Other Editions 1. Friend Reviews. To see what your friends thought of this book, please sign up. To ask other readers questions about Risk, Welfare and Work , please sign up. Be the first to ask a question about Risk, Welfare and Work.
Lists with This Book. This book is not yet featured on Listopia. Community Reviews. Showing Rating details. All Languages. More filters. All but three studies collected data from only one focal child per family. Two of these reported adjusting standard errors to take account of shared variance between siblings. A number of studies included more than one intervention group and did not report aggregate data for these.
In addition, a number of studies reported data for subgroups of recipients or by child age subgroups. Where studies included more than one intervention group but only one control group, we combined experimental groups for the primary analysis Higgins b , ensuring that control group data were entered only once to avoid duplication. Where studies included subgroups defined by location or respondent characteristics, we combined experimental subgroups and, separately, control subgroups as appropriate.
In the case of dichotomous outcomes, this was achieved simply by summing the appropriate statistics Higgins b. In these cases, studies only reported significance levels at the 0. We reported these in the text and in the 'other data' tables. We derived a number of outcomes from reported data where appropriate.
Two studies reported P values for all outcomes, and two further studies reported P values for some outcomes. We contacted the authors of all other studies to request measures of variance and received them from MDRC formerly Manpower Demonstration Research Corporation for all studies conducted by that organisation. The author of one study provided pooled standard deviations, and the Social Research and Demonstration Corporation SRDC provided standard errors for two further studies.
We were unable to obtain measures of variance for the remaining three studies. We used standard errors and P values to calculate standard deviations in the Cochrane standard deviation calculator tool. Where measures of variance were not available, we reported effects narratively in the text and in 'other data' tables in. Included studies were relatively homogeneous in terms of design, population and outcome measures, although the interventions varied in terms of approach and components provided.
We performed Chi 2 tests and used the I 2 statistic to test for statistical heterogeneity. We used the Cochrane 'Risk of bias' tool to investigate selective outcome reporting and incomplete outcome data Higgins a. We collected data at all available time points and classified them for analysis purposes in terms of the time elapsed between randomisation and data collection. We created three categories: time point 1 T1 , at 12 to 24 months since randomisation; time point 2 T2 , at 25 to 48 months; and time point 3 T3 , at 49 to 72 months.
One study reported partial data at 96 months.
We did not include these in the main analysis but summarised them narratively and reported them as 'other data' in. Two later publications analysed linked mortality data from two other studies at 15 and 17 to 19 years. We report these narratively in the text. In a few cases it was not possible to calculate an effect size. Where no measure of variance was available, we entered data into Review Manager 5 Review Manager as 'other data'.
If there were sufficient studies that reported standard deviations for an identical continuous outcome, we imputed them for outcomes with no measure of variance. In such cases, we conducted sensitivity analyses to investigate the effects of using different methods to impute the standard deviation e. We grouped outcomes into child and adult outcomes and then by type of outcome, that is, we synthesised and analysed adult physical health and adult mental health separately. Employing the approach to summary assessment of risk of bias suggested in Higgins a , we judged all studies to be at high risk of bias; therefore for each time point and category, we entered into the primary analyses all studies for which the necessary data were available.
We used standard mean differences to calculate combined effect sizes for continuous outcomes using Review Manager 5 Review Manager Where data were not available for individual outcomes, we reported these in the text within the appropriate outcome category and also presented them in 'other data' tables. In particular, we planned to conduct subgroup analyses of studies grouped in terms of the typology of interventions identified in the early stages of the review i.
However, this was not possible since the number of studies in each category within each time point was insufficient to permit further statistical analysis.
See a Problem?
In addition, we found that interventions defined by approach or ethos were more similar in practice than expected. We were also unable to conduct other planned subgroup analyses because they lacked either data or sufficient studies; these included studies that differed according to economic contexts, implementation, level of bias, age of child, level of participant disadvantage, ethnicity and whether or not participants became employed.
The largest source of variation in the interventions was in terms of the components provided. It was not possible to investigate the effects of this variation systematically, as there were again insufficient studies providing similar combinations of components. We were therefore limited to our planned primary analysis including all studies at each time point.
Where heterogeneity was high and we could identify a plausible hypothesis, we presented impacts from outlying studies separately and discussed the potential role of the identified characteristic. As described above, sensitivity analysis was used for post hoc investigation of heterogeneity. After importing all analyses from Review Manager 5 Review Manager to GRADEpro GDT , we assessed each outcome for threats to quality from risk of bias, inconsistency, indirectness, imprecision and publication bias.
Where it was not possible to calculate an effect estimate, we judged the quality of the evidence to be 'unclear'. Each outcome domain included outcomes measuring the same construct in different ways. For instance, studies reported parental mental health as both a continuous and a dichotomous variable. We graded evidence from each of these separately, but analyses within each domain could vary in quality, hampering the GRADE objective of reaching a judgement on the overall quality of the evidence for any single outcome domain.
This was based on the assessment of quality for the analyses including the largest sample size. For instance, at T1 five studies reported a measure of parental mental health. The combined sample size for the remaining three studies was The evidence from two of the single studies was low quality, and from the remaining study the evidence was very low quality.
We included the analyses on which the domain level judgement was based in Table 1 and Table 2. Where more than one analysis in a given domain contributed to the domain level assessment, we included the analysis with the largest sample size in the 'Summary of findings' tables. Where a study was deemed to be at very high risk of bias, we downgraded the evidence twice for instance where severe or systematic attrition was present.
We did not downgrade for risk of bias caused by contamination because since it leads to underestimation of impacts, it is deemed to be of less concern than risk of bias in domains likely to cause overestimated impacts Higgins a. To assess indirectness, we considered the extent to which the population and setting of the included studies was similar to those of interest for the review, and whether any outcome measures used were indirect or proxy measures. When assessing imprecision, we downgraded continuous outcomes reported as SMDs once if the confidence intervals included 0.
If the confidence interval crossed the line of no effect but did not include appreciable benefit or harm, according to the above criteria, we did not downgrade for imprecision. However, where the CI crossed null and the effect was very small, we noted that this was unlikely to be an important effect Ryan Where there was reason to suspect publication bias, we downgraded once on this criterion. We assigned all health outcomes a 'critical' rating and all economic outcomes an 'important' rating. Our synthesis is structured in terms of intervention, then population, then time point i.
In reaching conclusions regarding the applicability of evidence, we considered variations in context and culture. We extracted data on implementation and on national and local intervention contexts. We were unable to statistically investigate the role of such factors due to small numbers of studies sharing given characteristics. In addition, we considered the broader context in which most interventions were implemented, that is the USA, during a period of economic expansion, and in a country lacking universal healthcare coverage.
We discuss these issues in the section Overall completeness and applicability of evidence. See: Characteristics of included studies ; Characteristics of excluded studies. We conducted database searches in , and These yielded a total of references. We identified a further 12, references through an extensive stage of contacting authors, searching websites with searchable interfaces, and handsearching bibliographies see Appendix 3.
Because it was not possible to download the website search results to Endnote , we screened the titles for relevance and identified potentially eligible records, which we added to the results of the database searches in Endnote for a total of records. We removed duplicates from the combined results of the handsearches and the database searches. This left a total of references, of which we excluded on the basis of title or abstract. Figure 1 details the progress of citations through the screening process. Note this figure does not include publications found on websites without searchable databases.
Of the 94 identified publications associated with the 12 included studies, many did not report outcomes relevant to this review. In some cases authors reported the same outcomes in two or more publications. Where discrepancies in data were identified, we contacted study authors to confirm the correct values.
Following this process, we identified 23 publications reporting unique outcome data for the 12 included studies. We reference these 23 publications in the Included studies section. We include all other publications in the Additional references section. Nine of the included records came from the database searches, and we identified the remaining fourteen by handsearching only. Twelve studies met all of the inclusion criteria for this review. Three independent groups evaluated one intervention, Connecticut Jobs First. They selected samples on the basis of the focal child's age that were mutually exclusive, as shown in Figure 2.
North American research organisations led or were closely involved in most included studies. An academic team was responsible for only one study, conducted in Ontario Ontario Only New Hope differed in this respect, as it was initiated by a community organisation with a very clear aim of ensuring participants were better off in work. Many interventions had supplementary objectives of either reducing welfare rolls or making work pay. We discuss these in further detail below.
Eleven of the 12 included studies included a logic model or a textual description of hypothesised pathways linking the intervention to child outcomes. Only California GAIN did not report a theory of change in the publication extracted for this review. These might influence intermediate outcomes such as material resources, parental stress and mental health, parenting, and use of formal or informal child care.
Each of these may affect children's outcomes either through direct material changes or via changes in parental stress levels. Increased attendance at informal or formal child care could lead to increased exposure to educational experiences and to infectious illnesses. At each stage in the model, from targeted outcomes to effects on children, there is the potential for effects to be either positive or negative. There may also be positive effects on some outcomes and negative effects for others.
Effects may also vary depending on level of exposure or interactions between intervention components. An example of a logic model used by study authors is provided in Figure 3. Many of the included studies were large and complex. However, in most larger studies, only administrative data were collected for participants, with a subsample usually defined by age of the focal child of these surveyed to assess health outcomes.
Where this was the case, we extracted economic data only for the relevant subsample. All sample sizes are provided in Characteristics of included studies. All included studies were randomised controlled trials. Randomisation was at the level of the individual. Most evaluations began between and FTP reported assigning each client a case manager and an employment and training worker who worked on premises kept apart from the control group.
New Hope reported that 'project representatives' delivered the intervention but did not specify the place of delivery. All but one of the included studies took place during periods of increasing public and political opposition to welfare payments as well as reductions in the value of and entitlements to benefits. Participants were lone mothers and their children. Some studies included small percentages of lone fathers but used feminine terminology throughout due to the overwhelming majority of participants being women. Adult ages ranged from 18 to 54, and child ages ranged from 18 months to 18 years.
Since the interventions were aimed at lone parents in receipt of welfare, participants in all studies had low socioeconomic status. All studies included both existing welfare recipients and new applicants. Most of the study samples comprised unemployed lone parents, as identified by the study authors. We present full population characteristics in the Characteristics of included studies tables. These are described below and summarised in Table MFIP was particularly complex, having a total of 10 intervention subgroups defined by intervention type, location and recipient status.
We combined experimental and control groups as appropriate. A number of outcomes were not reported for every subgroup. Where this was the case, we appended the relevant forest plot with an explanatory footnote.
Safety, Health and Welfare at Work Act , Section 19
One intervention group received a labour force attachment LFA intervention, intended to place participants in employment of any kind as rapidly as possible, while the other received a human capital development HCD intervention, aimed at increasing respondents' employability by enhancing their skills. However, one group Riverside HCD differed systematically from the rest of the sample, since the HCD intervention was only available to respondents who lacked basic skills.
All studies collected data on differing age groups of children, with ages ranging from 18 months to 18 years. Figure 2 shows the age groups and subgroups reported by each study at each time point. In some cases, trials reported child outcomes only by subgroups. Ontario included children aged 2 to 18 years. SSP Recipients reported data on children aged 5. All of the voluntary studies described a process of obtaining informed consent from participants prior to randomisation.
The data we report were collected between 18 months and 18 years after randomisation. An independent team of researchers linked data from two studies to mortality data at 15 to 18 years CJF ; FTP At T1 and T2, all data reported were from samples that were still exposed to the intervention. In CJF and FTP , a proportion of the sample would have reached lifetime limits for welfare receipt and ceased to receive earnings disregards.
They would still have been exposed to sanctions, training and case management. At T3, a number of interventions had ended, and sample members were no longer exposed to intervention conditions. There was an expectation that impacts would continue after the interventions had ended because early labour market entry would allow respondents to accrue labour market advantage in terms of job quality and earnings, and that this could contribute to a better environment for children, with lasting health benefits.
Although the overarching aim of all included interventions was to promote employment among lone parents in receipt of welfare benefits, the motivation or ethos underlying this objective differed, as did the approach to achieving it. We describe these differences in detail in Description of the intervention.
Briefly, interventions had one of the following motivations. Human capital development HCD aimed to promote skills development in order to secure better quality employment. Figure 4 provides information about all studies' ethos and approach. Ontario did not fall into any of these categories. However, in practice this typology did not prove as useful as anticipated.
Even where study authors stated that the intervention explicitly adopted one of the above approaches, in practice there often seemed to be little variation between interventions of differing types. For instance, a number of LFA interventions offered training, and this did not necessarily differ in level or scope from that offered by HCD interventions. Study authors often reported that implementation of interventions varied widely within studies.
This variation occurred both at the level of intervention ethos and approach, and at the level of individual components, as might be expected in complex interventions with multiple components delivered in different sites and settings. We identified 10 individual components in the interventions see Figure 4.
Except those in UK ERA , control group respondents were also subject to many of these components, such as employment requirements and earnings disregards, to varying degrees. Thus, we describe only those intervention components that represent an incentive, sanction or service over and above what the control group received. Three studies tested variants of the main intervention with two or more intervention arms. Ontario tested the impact of five different approaches to delivering support to single parents.
Two groups within the study received employment training and are included in the review. One of these groups also received child care and support from health visitors. Failure to do so could result in financial sanctions involving partial or total cessation of welfare benefits for a specified period of time. Supplements were limited to a period of three years. While supplements were being paid, respondents' total income could increase even if their earned income was low.
Methods of calculating and levels of generosity varied across studies. Where earned income was disregarded, respondents could claim welfare while earning at much higher levels than previously. While respondents received earnings disregards, total welfare receipt and numbers on welfare were higher. As with supplements, disregards could increase total income even if earned income was low. Financial contributions toward the cost of child care were made either directly to childcare providers or to parents for a period of one to two years following uptake of employment.
Ontario provided a childcare programme to one arm of the intervention only. This differs from requirements to work or to take steps towards work component 1 in that participants were assigned a specific placement in the public, private or voluntary sector , which they had to attend for a set number of hours per week in order to continue receiving benefits, and they were not paid at a normal market rate.
New Hope assigned participants who were unsuccessful in finding work to community service jobs, but these were seen as proper employment and paid at the market rate. The package of welfare reforms passed in the USA in included a federal lifetime limit of 60 months of welfare receipt, with individual states retaining the freedom to apply shorter limits.
IFIP did not include time limits over and above those applying to the whole sample under a federal waiver granted in MFIP was able to maintain this under the intervention conditions, but New Hope participants were not held back from lifetime limits after the implementation of Wisconsin Works in The CJF time limit was 21 months. For recipients who found employment, the period in which they received earnings disregards and other programme benefits counted towards their welfare 'clock'.
Thus, there was a transition point where they went from working and receiving many other benefits to relying solely on earned income. Advisors had some discretion in the application of time limits and could grant extensions where they judged recipients to have made a good faith effort or to have been incapacitated through ill health. Sanctions varied in severity across interventions. Rates of sanctioning also varied within and between interventions.
- Sermon Series 38S (For All Occasions).
Most of the interventions included some form of education, training or both, whether they were explicitly described as HCD or LFA. FTP developed an extensive set of services around training and development, including assigning specific staff to each participant, funding ongoing training for those who found employment, and developing training work placements in conjunction with local employers. UK ERA also provided information, but in addition paid for training and provided bonuses of up to GBP on completion of training. CJF provided transitional Medicaid for two years after participants found employment, and IWRE subsidised health insurance while participants' incomes remained below the federal poverty level.
MFIP participants were eligible for Minnesota's subsidised health insurance scheme, but this was not an intervention component. In practice, case management differed in terms of levels of contact, flexibility, enforcement and monitoring. Based on each of these dimensions, we categorised the interventions as having high or low case management. Following the passage of PRWORA, the intervention condition was in fact 'usual care' as the interventions were rolled out statewide while they were being evaluated.
IFIP was terminated after 3. Wisconsin Works was introduced in and affected all respondents in New Hope Under AFDC, conditions varied to some degree from state to state. Receipt of welfare benefits was not subject to time limits. Usual care in Canada varied across states and also changed during the course of the interventions. Initially in both states work requirements were minimal. By contrast, in New Brunswick earnings disregards increased. There was no time limit on benefit receipt. Studies used a range of measures and formats to report primary and secondary outcomes within and between studies and across different time points.
The following provides a summary of which outcomes were reported by each intervention. Appendix 5 includes further details including the time points at which each outcome was reported. Although we searched for parental health outcomes, the vast majority of the sample in all included studies was female. Therefore, we describe adult health outcomes as 'maternal' for the remainder of the review.
All 12 studies reported maternal mental health outcomes. These are both well validated and widely used measures of risk of depression in adults. They were reported both as a continuous measure mean total score , and as a dichotomous measure proportion scoring above a cutpoint defined as 'at risk of depression'. These score each item from 1 to 3 or 1 to 5 depending on the age of the child and calculate the mean of the score for each item in the scale.
Investigators collected all of these measures via parent report. Nine studies reported a measure of child physical health. IFIP reported the percentage of children with fair or poor health, and MFIP reported the percentage with good or excellent health. All of these outcomes were collected via parent report. All employment measures were dichotomous, reporting the percentage of the sample employed or not employed for a given measure.
Nine studies reported measures of income. IWRE reported income for the month prior to the survey annualised to represent the previous year's income. Eleven studies reported a measure of earnings. IWRE reported annualised earnings in the month prior to the survey. NEWWS reported total earnings for years 1 to 5. Many of the interventions included either an earned income disregard or a financial supplement in order to make work pay and ease the transition from welfare to work. Most of these were time limited, with limits ranging from 21 to 36 months although extensions were often available for people with particular difficulties.
However, the periods while working and claiming welfare counted towards the respondent's lifetime limit on welfare receipt. While supplements or disregards were being paid, respondents' total income could increase even if their earned income was low. Obviously when time limits were reached, this effect ceased. In all cases, time limits were reached during the period defined as T2 24 to 48 months. A number of studies also reported total earnings. We extracted both measures in order to investigate the relationship between earned and total income.
IWRE reported the average amount received in the month prior to the survey, annualised, and NEWWS reported the total amount of benefit received between years 1 and 5. UK ERA reported the average amount of benefits received per week. New Hope and Ontario reported the proportion of the sample receiving benefits in the year prior to the survey. Since lower levels of total welfare paid and of numbers claiming welfare are the desirable outcomes from policy makers' perspectives, we defined these as positive in the analyses.
Therefore these studies did not report data on health insurance. Effect sizes were calculated for all reported measures. See Results of the search ; Characteristics of excluded studies. All studies had at least one item at high risk of bias, with two studies having four domains at high risk NEWWS ; Ontario All but two studies were at low risk of bias for allocation concealment and sequence generation, and it is very likely that these two studies conducted these but did not report it IFIP ; IWRE Blinding of outcome assessment was rare, and only one study reported baseline outcome measurements Ontario All risk of bias judgements are presented in the Characteristics of included studies tables and summarised in Figure 5 and Figure 6.
Since all studies were at high risk in at least one domain, the summary judgement was that all the included studies were at high risk of bias. Risk of bias summary: review authors' judgements about each risk of bias item for each included study. Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.
As such, they adopt robust procedures for sequence generation; communication with study authors confirmed this. Where reports explicitly describe allocation concealment, it is clearly conducted correctly, as in the following text:. One study took place in an academic setting Ontario While authors clearly described adequate methods of sequence generation for this study, they provided no information about allocation concealment, leading to a judgement of unclear risk of bias. Since the trial reports provided no information, we judged the studies to be at unclear risk for both sequence generation and allocation concealment.
- Welfare Indicators and Risk Factors: Thirteenth Report to Congress.
- Arthritis Interrupted!
- Risk, Welfare and Work, — Melbourne University Publishing?
However, again these are large and very reputable companies, and it is highly likely that they followed correct procedures. We assessed baseline measures at the level of individual outcomes. We assessed outcomes that were not reported at baseline to be at unclear risk of bias. Where investigators collected and adjusted for baseline measures, or reported them by intervention status with few significant differences, we assessed them to be at low risk.
Where studies did not report baseline outcomes by intervention status, or where there were differences between groups at baseline and authors reported no adjustment, we judged them to be at high risk. Ontario reported all baseline outcome measures, but these differed across intervention groups and authors did not describe any adjustment, so we assessed it as being at high risk of bias. NEWWS reported and adjusted for maternal mental health at baseline but did not collect any other health outcomes at baseline, and we deemed it to be at unclear risk of bias.
We assessed risk of bias in the domain of baseline characteristics at study level. Where studies reported baseline characteristics by intervention group and showed them to have no statistically significant differences, or where they used regression to adjust for baseline differences, we assigned a judgement of low risk of bias. Blinding of outcome assessment was conducted at the level of individual outcomes. Although it is very unlikely that assessors were blinded, we judged studies to be at unclear risk in the absence of further information. We conducted risk of bias assessment for missing outcome data at study level and at outcome level.
Compared to the larger sample that used administrative data, data from the survey overestimated impacts on earnings, and the authors urge caution in interpreting the findings. Thus, we consider that the risk of bias from missing outcome data is particularly high for this study. We could describe contamination in these studies as either indirect, that is, where the control group were likely to have been influenced by changes in social attitudes towards welfare and by awareness of changing rules affecting the majority of the population, or direct, where there was evidence that the control group were actually subject to the treatment condition at some point during the study.
In Canada, restrictions to welfare benefits for lone parents were also implemented in the late s, and in the UK requirements to seek employment were placed on lone parents of successively younger children. As a result, the control group were directly affected by the new policies in a number of studies. In most cases it is difficult to be sure how much these changes affected controls. We judged all of these studies to be at high risk of bias from direct contamination. All were successful in this except IFIP , since the intervention was terminated and the control group moved to the new state level policy three and a half years after randomisation.
We judged IFIP to be at high risk of bias from direct contamination and the remainder to be at low risk. Media coverage and publicity, as well as changed attitudes to welfare, accompanied the new policies, and there is evidence that some control group respondents in the CWIE studies believed themselves to be subject to the new rules Moffitt We judged all of the CWIE studies to be at high risk of bias from indirect contamination. We deemed these to be at high risk of indirect contamination.
It is likely that contamination bias would lead to an underestimation of impacts on economic outcomes among the intervention group, as control group members endeavoured to find employment in the mistaken belief that this was now required of them. Underestimation of impacts is not deemed to be as serious as overestimation Higgins a ; however, it is difficult to be sure what effect this type of contamination would have had on health outcomes. We assessed selective outcome reporting at study level. Protocols were not available for any of the included studies, and studies that reported data for more than one time point or subgroup rarely reported outcomes consistently across groups or times.
Government bodies, which arguably had a vested interest in the success of the interventions, funded and participated in all included studies except New Hope Sources of funding are recognised as potential sources of bias. However, as stated, the evaluations involved highly reputable research organisations that have made major contributions to the development of methods for conducting social experiments in their own right. As such, there is no suggestion that the findings were in any way influenced by the source of funding. See Table 1 ; Table 2.
All included studies were at high risk of bias in at least one domain, therefore we downgraded all evidence once for this criterion.
Risk, Welfare and Work
As a result, no evidence could attain a quality rating higher than moderate. However, exclusion of this study had only marginal effects on the estimates. We considered few effects to be at serious risk of inconsistency. Where heterogeneity was high but there was a plausible explanatory hypothesis, we did not downgrade and presented a post hoc sensitivity analysis in Effects of interventions. We discuss these instances in Effects of interventions. Since the populations of all included studies met these criteria, we did not downgrade for indirectness. None of the outcomes included in the review were indirect measures, so we did not downgrade for indirectness in relation to outcomes.
We did downgrade a number of health outcomes for imprecision due to low event rates. Since we had no reason to suspect that other studies have been conducted but remained unpublished, we did not downgrade any outcomes for publication bias. We assessed outcomes for which an effect size could not be calculated as being of unclear quality. Within each domain, there was often a range of quality assessments for different measures.
We based an overall assessment for the domain as a whole on the grade assigned to the analysis or analyses with the largest total sample size. On this basis, of the 12 health domains, we assessed all as moderate quality except T1 maternal mental health low quality , T3 maternal physical health low quality and T3 child mental health unclear quality.
We assessed all T1 and T2 economic domains as moderate quality and all T3 ones as low quality. We report these domain level assessments in the domain summaries in Effects of interventions. See: Table 1 ; Table 2. The comparison in all cases was with usual care see Description of the intervention. T1 maternal mental health , as this was how studies reported results. We reported these outcomes narratively in the text, and where it was possible to calculate an effect size, we presented it in forest plots. These are designed to summarise the direction and strength of effects, as well as the quality of evidence available, in a way that readers can apprehend visually.
Upward and downward pointing arrows indicate positive and negative directions of effect, respectively, defined in terms of the desirability of the outcome e. A single arrow represents a 'very small' effect, two arrows a 'small' effect, and three a 'modest' effect, as defined in Table A 'o' indicates that there is evidence of no effect. The colour of the arrow denotes the quality: green indicates moderate quality; amber, low quality; and red, very low quality.
Where we could not assess quality, we used black. For dichotomous outcomes, we defined the 'event' as reported by study authors, whether it was considered a 'good' or a 'bad' outcome. For instance, when calculating employment, we defined the good outcome being employed as the event, although traditionally the bad outcome is considered the event Alderson However, we reported the RRs in this way because this is how the original studies reported them. We identify instances where the 'good' outcome is defined as the event as such in the 'Summary of findings' tables Table 1 ; Table 2.
Effect sizes across virtually all outcomes were small i. SMD 0. However, there is debate regarding the utility of these rules for interpreting the effects of population level interventions, since an effect that appears small or even tiny when considered at the level of the individual may be important if replicated across a large population Kunzli , Siontis Cohen has stated that effect sizes observed outside laboratory conditions are likely to be small, and that use of his definitions of effect magnitude warrant caution Cohen Other authors have also argued that in interventions which affect large populations, an SMD of 0.
We present our definitions in Table 32 alongside those recommended by Cohen. The effect magnitude for RRs below 1 is calculated by subtracting 1 from the RR then multiplying by , such that RR 0. These are defined as small and very small effects, respectively. All five studies reporting at T1 reported a measure of maternal mental health. Although the evidence was of moderate quality, the effect was very small SMD 0. The evidence from Ontario was of very low quality for the same reason and due to high attrition. Forest plot of comparison: 1 Time point 1 Maternal mental health, outcome: 1.
Comparison 1 Time point 1 Maternal mental health, Outcome 1 Maternal mental health continuous. Comparison 1 Time point 1 Maternal mental health, Outcome 2 Maternal mental health dichotomous. All of the six included studies that reported at T2 reported maternal mental health. Forest plot of comparison: 2 Time point 2 Maternal mental health, outcome: 2.
Comparison 2 Time point 2 Maternal mental health, Outcome 1 Maternal mental health continuous. Evidence from both studies was of moderate quality, although the result from California GAIN was unlikely to be important as the effect was very small and the CI crossed the line of null effect. Comparison 2 Time point 2 Maternal mental health, Outcome 3 Maternal mental health dichotomous. We calculated effect sizes for the two dichotomous outcomes; there was a very small effect in favour of the intervention for high risk of depression in IFIP RR 0.
Forest plot of comparison: 3 Time point 3 Maternal mental health, outcome: 3. Comparison 3 Time point 3 Maternal mental health, Outcome 1 Maternal mental health continuous. Comparison 3 Time point 3 Maternal mental health, Outcome 2 Maternal mental health dichotomous. At T1 and T3 there were individual studies that reported larger negative effects on maternal mental health, but the evidence was of low or very low quality. One study that reported a very small negative impact at T1 did not report maternal mental health at T3.
At all time points, evidence of moderate quality predominated, therefore the overall quality assessment for maternal mental health at each time point was moderate. One study reported the percentage of the sample in fair or poor health at T1, providing evidence of low quality that the intervention group reported better health than control RR 0.
We downgraded this evidence due to imprecision. Forest plot of comparison: 4 Time point 1 Maternal physical health, outcome: 4. Although the evidence was of moderate quality, the effect is unlikely to be important, as the effect size is very small and the CI crosses the line of null effect. Forest plot of comparison: 5 Time point 2 Maternal physical health, outcome: 5. Event defined as In good or excellent health. This showed a very small effect in favour of control RR 0. However, the evidence was of low quality due to high risk of bias from attrition, and the effect was unlikely to be important as it was very small and the CI crossed the line of null effect.
Forest plot of comparison: 6 Time point 3 Maternal physical health, outcome: 6. Only four studies reported measures of maternal physical health, and all but one reported small to very small positive effects. UK ERA reported a very small negative effect on maternal physical health at T3, but the evidence was of low quality. The evidence on maternal physical health at T1 and T3 was predominantly of low quality; therefore we assessed evidence at both time points to be low quality overall. At T2, the evidence was of moderate quality.
Four studies reported a measure of child behaviour problems at T1. Ontario reported the proportion of the sample with three or fewer behaviour disorders as a categorical variable. We dichotomised the latter variable to create an outcome for the proportion of the sample with two or three behaviour disorders. Evidence from each study was of moderate quality. Forest plot of comparison: 7 Time point 1 Child mental health, outcome: 7. Comparison 7 Time point 1 Child mental health, Outcome 1 Child behaviour problems continuous.
Individual effect sizes for the dichotomous outcomes showed modest negative effects on behaviour problems in the intervention groups in both Ontario RR 1. However, evidence from these outcomes was low quality in CJF Yale and very low quality in Ontario due to wide confidence intervals including no effect and appreciable harm and very high risk of bias in Ontario Comparison 7 Time point 1 Child mental health, Outcome 2 Child behaviour problems dichotomous.
This effect was very small and the CI crossed the line of null effect, so it is unlikely to be important. Forest plot of comparison: 8 Time point 2 Child mental health, outcome: 8. Comparison 8 Time point 2 Child mental health, Outcome 1 Child behaviour problems continuous. Comparison 8 Time point 2 Child mental health, Outcome 2 Adolescent mental health dichotomous. We could identify no plausible hypothesis to explain this heterogeneity. The evidence was of low quality due to this unexplained heterogeneity.
Forest plot of comparison: 9 Time point 3 Child mental health, outcome: 9. Comparison 9 Time point 3 Child mental health, Outcome 1 Child behaviour problems continuous. The intervention had a small positive effect on externalising behaviour, a very small positive effect on internalising behaviour and a very small negative effect on hyperactivity.
Behaviour problems were very slightly higher among the IFIP applicant intervention group intervention This difference in effect was possibly related to study characteristics. Two further studies reported a modest negative effect, but the evidence was of low and very low quality. Since the evidence was primarily of moderate quality at T1 and T2, this was the overall assessment for both time points. Most evidence at T3 was of unclear quality, so this was the overall domain assessment. Only one study reported a measure of child physical health at T1. As this effect was very small and the CI crossed zero, it is unlikely to be important.
Forest plot of comparison: 10 Time point 1 Child physical health, outcome: One study reported the percentage of the sample in good or excellent health MFIP ; this showed a very small effect in favour of control RR 0. As this effect was very small and the CI crossed the line of null effect, it is unlikely to be important. Evidence for all outcomes was of moderate quality. Forest plot of comparison: 11 Time point 2 Child physical health, outcome: Comparison 11 Time point 2 Child physical health, Outcome 1 Child physical health continuous.
Comparison 11 Time point 2 Child physical health, Outcome 2 Child physical health dichotomous. Six studies reported child physical health at T3. No measure of variance was available for SSP Recipients Since standard deviations for four studies reporting the same outcome were available, we imputed a standard deviation for SSP Recipients based on the average for the other four studies. Forest plot of comparison: 12 Time point 3 Child physical health, outcome: Comparison 12 Time point 3 Child physical health, Outcome 1 Child physical health continuous.
Comparison 12 Time point 3 Child physical health, Outcome 2 Child physical health dichotomous. One individual study reported no effect. At each time point, most evidence on child physical health was of moderate quality. Most of the MFIP sample were not subject to employment mandates and could receive earnings disregards for lower levels of employment participation, providing a plausible hypothesis to explain this heterogeneity.
The effects in Analysis Two studies reported the proportion who had ever worked in the fifth year of the study New Hope ; UK ERA , and one study reported the proportion who had ever worked between years 1 and 5 of the study NEWWS Overall, the intervention showed very small to small positive effects on all measures of employment at T1 and T2 ranging from RR 1. All evidence at T1 and T2 was of moderate quality. At T3 the effects on most measures of employment were close to zero, with similar proportions of the control group in employment at 49 to 72 months.
Much of the evidence on employment at T3 was of low quality. At T1 and T2, we assessed most evidence on employment as moderate quality, therefore the domain level quality assessment was also moderate. There were a number of differences between these studies that may have contributed to this, including the lack of any earnings supplement or disregard over and above that received by the control group in the NEWWS intervention.
New Hope reported a very small positive effect on intervention group annual earnings SMD 0. However, as this effect was very small and the CI crossed zero, it is unlikely to be important. A possible explanation for this is that earnings disregards had ceased by this point for most CJF respondents. Both MFIP and SSP Recipients were still providing earnings supplements when T2 data were collected, which may account for their stronger positive effects on income.
However, although FTP had also ceased to supplement income, income was higher in the intervention group. We calculated average earnings in year 4 for FTP None of these effects reached statistical significance. Analysis Study authors did not calculate statistical significance.
We could not calculate statistical significance. In the two IFIP groups, there were small differences in favour of control ongoing group and intervention applicant group. Neither reached statistical significance Analysis All five experimental groups in the NEWWS study reported that the intervention groups earned more than control in years 1 to 5 of the study.
We could not calculate an effect size for the other study that reported slightly higher earnings among five intervention groups, which were statistically significant in two of the groups.