Research appendix

The evidence behind the method.

DevKoe's method isn't a hunch about what makes English hard for Japanese engineers. Every one of the three areas and nine challenges we train is grounded in published second-language acquisition research. This page lists that evidence in full, so you can see exactly why the program is built the way it is.

Three areas · Nine challenges · 27 cited sources

How progress is shown

You hear it yourself.

The honest problem with measuring spoken English is that one coach listening to one person cannot produce a real number. So we do not pretend to. We do something better: we let you hear the change in your own voice.

How it works

Three identical checkpoints. In weeks 2, 6 and 11 you record a 90-second answer to the same question — describing a project you are working on, as if to a new colleague. Same question, no script, no second take.

The question never changes, on purpose. That is the whole point. If the question moved, the comparison would be worthless — and you would have no way of knowing.

A person listens. Your coach listens to all three, alongside twelve weeks of sessions, and writes you an honest review: what changed, what did not, and what to work on next.

We do not give you a score. We used to plan to — nine metrics, counted, normalised per 100 words. We dropped it, because a single listener scoring by ear cannot deliver the objectivity a number implies, and a number you cannot trust is worse than no number at all. Your own recordings are harder evidence than any figure we could have printed.

Area A · Phonology

The sounds and rhythm of English.

The deepest, slowest-changing layer — the difficulty begins in perception, before a word is even spoken. Perceptual training precedes production drilling throughout.

A1 Sound substitutions (l / r)

The single Japanese liquid phoneme maps onto both English /l/ and /r/, so the brain never forms a separate category for the second sound — making the distinction hard to hear before it is hard to produce.

Miyawaki et al. (1975) — Perception & Psychophysics
Japanese speakers could not reliably distinguish /l/ from /r/ in perception tests, performing near chance, while American English speakers were near-perfect. Established the difficulty as rooted in the absence of a phonemic contrast in Japanese — a problem that begins before speech is even attempted, not a motor-production deficit.
Flege (1995) — Speech Learning Model
Proposed that learners fail to acquire a new phoneme when the target is close enough to a native sound that the brain maps it onto the existing category. For Japanese speakers both /l/ and /r/ map onto the single Japanese liquid, making the distinction extremely resistant to change without explicit perceptual training.
Bradlow et al. (1997) — Journal of the Acoustical Society of America
Japanese speakers trained on /l/–/r/ with high-variability phonetic training improved in both perception and production. Confirmed the link between perceptual failure and production error — those who could not hear the difference could not produce it — validating the substitution as a measurable, trainable deficit.

A2 Consonant cluster epenthesis

Japanese phonotactics favour a consonant-vowel syllable. English clusters are broken up with inserted vowels that follow predictable Japanese rules — which is why cluster-heavy technical vocabulary is especially vulnerable.

Dupoux et al. (1999) — Journal of Memory and Language
Japanese speakers systematically "hallucinate" vowels when perceiving consonant clusters, reporting an extra syllable that isn't in the signal. This perceptual illusion directly predicts epenthesis in production: speakers insert vowels because their phonological system represents the word with them.
Vendelin & Peperkamp (2006) — Phonology
Epenthetic vowel insertion is not random but follows rules tied to the mora structure of Japanese; inserted vowels match the quality Japanese phonotactics expect. Confirms the L1 syllable template being imposed onto L2 words.
Kabak & Idsardi (2007) — Language and Speech
Even highly proficient Japanese speakers continue to show epenthesis, particularly at word boundaries and in initial clusters. Its persistence in advanced learners indicates it needs explicit targeted training rather than resolving through exposure alone.

A3 Lexical stress errors

Japanese mora timing gives each syllable roughly equal weight; transferred to English it flattens the strong-weak alternation that defines lexical stress, causing flat or misplaced stress on polysyllabic technical words.

Wenk (1985) — Linguistics
Japanese learners transfer mora-timed rhythm from their L1, producing syllables of roughly equal length and prominence instead of English's alternating pattern — directly causing lexical stress errors because the acoustic cues for stress are flattened.
Zielinski (2008) — Journal of Second Language Pronunciation
Of the features that reduced intelligibility for English listeners, lexical stress errors had a disproportionately large effect on comprehension — larger than vowel or consonant substitutions. Misplaced stress on a specialised term can stop a listener recognising the word at all.
Hirst & Utsugi (2011) — Proceedings of the International Congress of Phonetic Sciences
Compared Japanese and English prosody directly, showing Japanese pitch accent and mora timing create a fundamentally different mental representation of word rhythm. Japanese speakers are operating from a different underlying system, not simply producing stress incorrectly.

Area B · Grammar

What English requires that Japanese doesn't.

Structural features with no equivalent in Japanese. Awareness alone doesn't fix them in spontaneous speech — these are habit-formation targets, built through repetition.

B1 Article omissions (a / an / the)

Japanese has no grammatical category equivalent to the English article. With no L1 structure to build on, learners omit articles — or, when they produce one, fluctuate by specificity rather than English definiteness.

Master (1997) — TESOL Quarterly
Across L1 backgrounds, speakers of article-less languages — including Japanese — showed the slowest and most incomplete acquisition of the English article system, with persistent omission of both definite and indefinite articles even at advanced proficiency.
Ionin & Wexler (2002) — Second Language Research
Learners whose L1 lacks articles fluctuate between definite and indefinite forms by an underlying specificity feature rather than English definiteness — explaining why Japanese speakers sometimes produce the wrong article and why article errors remain systematic.
Snape et al. (2006) — EUROSLA Yearbook
Examined Japanese adult learners specifically: even at upper-intermediate and advanced levels, definite-article omission before unique or previously mentioned nouns remained common. The absence of any article-like category in Japanese means learners acquire an entirely new system from scratch.

B2 Plural morpheme omissions

Japanese nouns don't inflect for number. The plural form is present in monitored production but drops out under the time pressure of spontaneous speech, because the parser doesn't compute number agreement automatically.

Hawkins & Liszka (2003) — The Lexical Basis of Sentence Processing (Benjamins)
Learners whose L1 lacks number marking fail to apply the plural morpheme not because they don't know the rule, but because their parser doesn't compute number agreement during fluent production — so plural -s drops out under time pressure.
Ionin & Wexler (2002) — Second Language Research
Japanese learners showed correlated omission of articles and plural morphemes, pointing to a shared difficulty with nominal functional morphology — those who omitted articles were significantly more likely to also omit plural -s.
Lardiere (2007) — Ultimate Attainment in Second Language Acquisition (Lawrence Erlbaum)
A Japanese-L1 speaker at near-native proficiency, studied over years, still omitted plural and possessive morphemes in spontaneous speech despite perfect rule knowledge in metalinguistic tasks — establishing that knowing a rule and applying it automatically in real time are separable.

B3 Subjectless clauses

Japanese is a canonical pro-drop language — subjects recoverable from context can be dropped. Learners transfer this discourse rule to English, producing subjectless clauses specifically when the referent is clear, so the errors are patterned, not random.

White (1985) — Language Learning
Introduced the pro-drop parameter to SLA, showing learners from pro-drop languages transfer subject omission into English. Predicted Japanese learners would produce subjectless sentences at a significantly higher rate than learners from non-pro-drop backgrounds.
Tsimpli et al. (2004) — Second Language Research
Found strong evidence that adults cannot fully reset the pro-drop parameter for a non-pro-drop L2 like English. Japanese-L1 participants showed persistent subject omission even after years of instruction, especially in informal or time-pressured speech.
Papp (2000) — IRAL
In spontaneous production by Japanese learners, subject omission was most frequent exactly when the referent was recoverable from context — the condition under which Japanese grammar licenses omission. The errors are predictable and patterned.

Area C · Fluency

How smoothly your speech flows.

The most immediately coachable area, trained first for the fastest visible gains. These disfluencies reflect processing load in a second language, not personality or carelessness.

C1 Filler words

Filler rate tracks lexical-retrieval difficulty. In a technical English context, Japanese speakers face compounded load — general L2 retrieval plus domain vocabulary — so high filler rates reflect processing demand, not habit.

Bortfeld et al. (2001) — Journal of Speech, Language, and Hearing Research
Filler rate is directly correlated with lexical-retrieval difficulty. For Japanese speakers in technical English the effect compounds — both general L2 access difficulty and domain vocabulary load — predicting systematically higher filler rates than native speakers on the same tasks.
Watanabe et al. (2008) — Language and Speech
Japanese speakers used significantly more filled pauses in English than in Japanese, and more than native English speakers on equivalent tasks. The increase was largest in informationally dense speech — precisely a technical explanation task.
Lasagabaster & Sierra (2005) — IRAL
High filler rates significantly reduced listener ratings of fluency, confidence, and expertise even when content was identical — directly relevant to professional settings where perceived competence affects collaboration, promotion, and trust independent of actual ability.

C2 Intra-clause silent pauses

Mid-clause pausing reflects planning bottlenecks — syntactic planning, lexical retrieval, and phonological encoding at once in a less automatised system. The more demanding the task, the more such pauses appear.

Goldman-Eisler (1968) — Psycholinguistics: Experiments in Spontaneous Speech (Academic Press)
Established the link between silent pauses and cognitive planning difficulty — pauses cluster before semantically complex, low-predictability words. Japanese speakers in technical English face planning bottlenecks that surface as mid-clause pauses at a measurably higher rate.
Riazantseva (2001) — Studies in Second Language Acquisition
Non-native speakers placed a significantly higher proportion of pauses within clauses rather than at boundaries (which native speakers rarely do). Mid-clause pausing was identified as the most reliable acoustic marker separating fluent from disfluent L2 speech.
Tavakoli (2011) — Applied Linguistics
More cognitively demanding tasks — technical explanations with cause-and-effect reasoning — produced disproportionately more mid-clause pauses in non-native speakers, confirming that a technical speaking task specifically elicits the behaviour this challenge targets.

C3 Repetitions and false starts

An internal self-monitor detects a planning failure and interrupts production, causing a cutoff and restart. L2 speakers trigger it far more often — most commonly on the content words that carry technical meaning.

Levelt (1983) — Psychological Review
Developed the self-repair framework: an internal monitor detects an error or planning failure and interrupts, leading to a cutoff and restart. L2 speakers trigger this monitor far more often because they manage planning, retrieval, and encoding in a less automatised system.
Kormos (1999) — Studies in Second Language Acquisition
Applied the self-repair model to L2 speakers: non-natives produced significantly more repetitions and restarts, triggered most often by lexical-search failure. Japanese speakers of English showed a particularly high rate of mid-utterance restarts on content words.
Skehan & Foster (2001) — Applied Linguistics
Tasks requiring speakers to organise and explain information under time pressure produced the highest disfluency rates — confirming repetitions and false starts as a valid, task-sensitive measure of L2 fluency rather than a stylistic trait, supporting their use as a pre/post metric.

← Back to the method on the home page

Get started

A method built on evidence.

See how the 90-day program puts this research to work — then book a free intro call.

See the program Back to the method →