---
title: "SequencePredict"
language: "en"
type: "Symbol"
summary: "SequencePredict[{seq1, seq2, ...}] generates a SequencePredictorFunction[...] based on the sequences given. SequencePredict[training, seq] attempts to predict the next element in the sequence seq from the training sequences given. SequencePredict[training, {seq1, seq2, ...}] gives predictions for each of the sequences seqi. SequencePredict[name, seq] uses the built-in sequence predictor represented by  name. SequencePredict[..., seq, prop] give the specified property of the prediction associated with seq."
keywords: 
- predict
- sequence
- sequence learning
- markov model
- markov chain
- markov process
- sequence probability
- generate sequence
- random sequence
canonical_url: "https://reference.wolfram.com/language/ref/SequencePredict.html"
source: "Wolfram Language Documentation"
related_guides: 
  - 
    title: "Supervised Machine Learning"
    link: "https://reference.wolfram.com/language/guide/SupervisedMachineLearning.en.md"
  - 
    title: "Machine Learning"
    link: "https://reference.wolfram.com/language/guide/MachineLearning.en.md"
related_functions: 
  - 
    title: "Predict"
    link: "https://reference.wolfram.com/language/ref/Predict.en.md"
  - 
    title: "Classify"
    link: "https://reference.wolfram.com/language/ref/Classify.en.md"
  - 
    title: "SequencePredictorFunction"
    link: "https://reference.wolfram.com/language/ref/SequencePredictorFunction.en.md"
  - 
    title: "TimeSeriesModelFit"
    link: "https://reference.wolfram.com/language/ref/TimeSeriesModelFit.en.md"
  - 
    title: "TimeSeriesForecast"
    link: "https://reference.wolfram.com/language/ref/TimeSeriesForecast.en.md"
  - 
    title: "EstimatedProcess"
    link: "https://reference.wolfram.com/language/ref/EstimatedProcess.en.md"
---
[EXPERIMENTAL]

# SequencePredict

SequencePredict[{seq1, seq2, …}] generates a SequencePredictorFunction[…] based on the sequences given.

SequencePredict[training, seq] attempts to predict the next element in the sequence seq from the training sequences given.

SequencePredict[training, {seq1, seq2, …}] gives predictions for each of the sequences seqi.

SequencePredict["name", seq] uses the built-in sequence predictor represented by "name".

SequencePredict[…, seq, prop] give the specified property of the prediction associated with seq.

## Details and Options

* The sequences ``seqi`` can be lists of either tokens or strings.

* Sequences ``seqi`` are assumed to be unordered subsequences of an underlying infinite sequence.

* In ``SequencePredict[…, seq, prop]``, properties are as given in ``SequencePredictorFunction[…]``; they include:

|                          |                                                              |
| ------------------------ | ------------------------------------------------------------ |
| "NextElement"            | most likely next element                                     |
| "NextElement" -> n        | individually most likely next n elements                     |
| "NextSequence" -> n       | most likely next length-n sequence of elements               |
| "RandomNextElement"      | random sample from the next-element distribution             |
| "RandomNextElement" -> n  | random sample from the next-sequence distribution            |
| "Probabilities"          | association of probabilities for all possible next elements  |
| "SequenceProbability"    | probability for the predictor to generate the given sequence |
| "SequenceLogProbability" | log probability for the predictor to generate the sequence   |
| "Properties"             | list of all properties available                             |

* Examples of built-in sequence predictors include:

|              |                                          |
| ------------ | ---------------------------------------- |
| "Chinese"    | character-based Chinese-language text    |
| "English"    | character-based English-language text    |
| "French"     | character-based French-language text     |
| "German"     | character-based German-language text     |
| "Portuguese" | character-based Portuguese-language text |
| "Russian"    | character-based Russian-language text    |
| "Spanish"    | character-based Spanish-language text    |

* The following options can be given:

|                   |           |                                                                   |
| ----------------- | --------- | ----------------------------------------------------------------- |
| FeatureExtractor  | Automatic | how to preprocess sequences                                       |
| Method            | Automatic | which prediction algorithm to use                                 |
| PerformanceGoal   | Automatic | aspects of performance to try to optimize                         |
| RandomSeeding     | 1234      | what seeding of pseudorandom generators should be done internally |

* Typical settings for ``FeatureExtractor`` for strings include:

|                       |                                                          |
| --------------------- | -------------------------------------------------------- |
| "SegmentedCharacters" | string interpreted as a sequence of characters (default) |
| "SegmentedWords"      | string interpreted as a sequence of words                |

* Possible settings for ``PerformanceGoal`` include:

|                 |                                                     |
| --------------- | --------------------------------------------------- |
| "Memory"        | minimize storage requirements of the predictor      |
| "Quality"       | maximize accuracy of the predictor                  |
| "Speed"         | maximize speed of the predictor                     |
| "TrainingSpeed" | minimize time spent producing the predictor         |
| Automatic       | automatic tradeoff among speed, accuracy and memory |

* ``PerformanceGoal -> {goal1, goal2, …}`` will automatically combine ``goal1``, ``goal2``, etc.

* Possible settings for ``RandomSeeding`` include:

|           |                                                        |
| --------- | ------------------------------------------------------ |
| Automatic | automatically reseed every time the function is called |
| Inherited | use externally seeded random numbers                   |
| seed      | use an explicit integer or strings as a seed           |

* Possible settings for ``Method`` include:

"Markov"	Markov model

* In ``SequencePredict[…, Method -> {"Markov", "Order" -> order}]``, ``order`` corresponds to Markov process memory size.

* In ``SequencePredict[…, "SequenceProbability"]``, some probability mass is kept for unknown elements.

* In ``SequencePredict[training, {}, prop]``, ``{}`` is interpreted as an empty list of sequences rather than an empty sequence.

## Examples (11)

### Basic Examples (1)

Train a sequence predictor on a set of sequences:

```wl
In[1]:= sp = SequencePredict[{{[image], [image]}, {[image], [image], [image]}, {[image], [image], [image]}, {[image], [image]}}]

Out[1]=
SequencePredictorFunction[Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "NominalSequence"]], 
     "Output" -> Association["f1" -> Association["Type" - ... /@ 
             MachineLearning`file163SortedHashAssociation`PackagePrivate`keys$]], 
         "Version" -> {14.3, 0}]]], "RestLogProbabilities" -> 
     Association["1Gram" -> {-1.1047354, 0., -0.5898255, -0.82390875, -0.5898255, -0.5898255}]]]]
```

Predict the next element of a new sequence:

```wl
In[2]:= sp[{[image], [image]}]

Out[2]= [image]
```

Obtain the probabilities of the next element given the sequence:

```wl
In[3]:= sp[{[image], [image]}, "Probabilities"]

Out[3]= <|[image] -> 0.370295, [image] -> 0.422177, [image] -> 0.207528|>
```

Obtain a random next element according to the preceding distribution:

```wl
In[4]:= sp[{[image], [image]}, "RandomNextElement"]

Out[4]= [image]
```

Obtain multiple predictions at a time:

```wl
In[5]:= sp[{{[image], [image]}, {[image], [image]}}]

Out[5]= [image]
```

Predict the most likely next element and reuse this intermediate guess to predict the following element:

```wl
In[6]:= sp[{[image], [image]}, "NextElement" -> 2]

Out[6]= [image]
```

Predict the most likely following sequence:

```wl
In[7]:= sp[{[image], [image]}, "NextSequence" -> 2]

Out[7]= [image]
```

Compare the probabilities for the preceding sequences:

```wl
In[8]:= sp[{{[image], [image], [image], [image]}, {[image], [image], [image], [image]}}, "SequenceProbability"]

Out[8]= {0.0117689, 0.0160178}
```

### Scope (4)

#### Custom Sequence Predictors (3)

Train a sequence predictor on a list of strings:

```wl
In[1]:= sp = SequencePredict[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog", "what a lovely cat", "this is not a dog"}]

Out[1]=
SequencePredictorFunction[Association["Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Association["Type" -> "Text", " ...  -1.4288273, 
        -0.7409227, -1.4288273, -1.1051797, -1.1663314, -1.1051797, -1.3227284, -1.6617415, 
        -1.4288273, -1.7789661, -1.7789661, -1.5695266, -1.3227284, -1.7789661, -1.7789661, 
        -1.6617415, -1.7789661, -1.7789661}]]]]
```

Predict the next character following a given string:

```wl
In[2]:= sp["the ca"]

Out[2]= "t"
```

Predict the next four characters:

```wl
In[3]:= sp["the ca", "NextElement" -> 4]

Out[3]= "t is"
```

Obtain the probabilities for each character to follow the given string:

```wl
In[4]:= sp["the ca", "Probabilities"]

Out[4]= <|" " -> 0.177784, "a" -> 0.0311674, "b" -> 0.0134724, "c" -> 0.0178961, "d" -> 0.0134724, "e" -> 0.0223199, "f" -> 0.0134724, "g" -> 0.0223199, "h" -> 0.0178961, "i" -> 0.0223199, "l" -> 0.0178961, "m" -> 0.0134724, "n" -> 0.0134724, "o" -> 0.0223199, "r" -> 0.0823567, "s" -> 0.0867804, "t" -> 0.35347, "v" -> 0.0134724, "w" -> 0.0134724, "y" -> 0.0311674|>
```

---

Train a sequence predictor on the list of common English words, each word treated as a sequence of characters:

```wl
In[1]:= sp = SequencePredict[WordList[]]

Out[1]= SequencePredictorFunction[…]
```

Predict the most likely next character from a given sequence:

```wl
In[2]:= sp["ab"]

Out[2]= "l"
```

In the previous example, each word is considered as a subsequence of an infinite sequence. Use the character `` | `` to mark boundaries between words:

```wl
In[3]:= markedWords = "|" <> # <> "|"& /@ WordList[];
```

Build a new sequence predictor aware of word boundaries:

```wl
In[4]:= sp = SequencePredict[markedWords]

Out[4]= SequencePredictorFunction[…]
```

Generate the beginning of an English-like word:

```wl
In[5]:= sp["|", "RandomNextElement" -> 4]

Out[5]= "hi-d"
```

---

Load a book from ``ExampleData`` :

```wl
In[1]:= dq = ExampleData[{"Text", "DonQuixoteIEnglish"}];
```

Train a sequence predictor on this book:

```wl
In[2]:= spchar = SequencePredict[{dq}]

Out[2]= SequencePredictorFunction[…]
```

Sample a random string in the book style:

```wl
In[3]:= spchar[{"thou "}, "RandomNextElement" -> 20]

Out[3]= {"wilt what might him "}
```

Train another sequence predictor, interpreting strings as word sequences rather than character sequences:

```wl
In[4]:= spword = SequencePredict[{dq}, FeatureExtractor -> "SegmentedWords"]

Out[4]= SequencePredictorFunction[…]
```

Complete the preceding string with 10 consecutive words (spaces and punctuation marks are considered as words):

```wl
In[5]:= spword["I", "RandomNextElement" -> 10]

Out[5]= " will take is at each"
```

#### Built-in Sequence Predictors (1)

Download the ``"English"`` built-in sequence predictor:

```wl
In[1]:= sp = SequencePredict["English"]

Out[1]= SequencePredictorFunction[…]
```

Obtain the log-probability of the given string:

```wl
In[2]:= sp["the cat sat on the hat", "SequenceLogProbability"]

Out[2]= -38.6666
```

### Options (5)

#### FeatureExtractor (2)

Preprocess the training text to predict on words rather than at the character level:

```wl
In[1]:= sp = SequencePredict[{WikipediaData["computer"]}, FeatureExtractor -> "SegmentedWords"]

Out[1]= SequencePredictorFunction[…]
```

Complete the preceding string with 10 consecutive words (spaces and punctuation marks are considered as words):

```wl
In[2]:= sp["computer ", "RandomNextElement" -> 10]

Out[2]=
"may beArt
avoidrelay\", "
```

---

Preprocess the training text to lowercase to obtain a better statistic with higher letter counts:

```wl
In[1]:= sp = SequencePredict[{WikipediaData["computer"]}, FeatureExtractor -> "LowerCasedText"]

Out[1]= SequencePredictorFunction[…]
```

#### PerformanceGoal (2)

Train a classifier with an emphasis on the resulting model memory footprint:

```wl
In[1]:= sp = SequencePredict[{WikipediaData["usa"]}, PerformanceGoal -> "Memory"]

Out[1]= SequencePredictorFunction[…]

In[2]:= ByteCount[sp]

Out[2]= 2097104
```

Compare with the automatically generated model size:

```wl
In[3]:= ByteCount@SequencePredict[{WikipediaData["usa"]}]

Out[3]= 4130032
```

---

Tune the computation time and precision when exploring the full sequence probability space:

```wl
In[1]:= sp = SequencePredict[{ExampleData[{"Text", "DonQuixoteIEnglish"}]}]

Out[1]= SequencePredictorFunction[…]
```

Favor fast and approximated exploration:

```wl
In[2]:= {shortertime, quickprediction} = AbsoluteTiming[sp["This is", "NextSequence" -> 50, PerformanceGoal -> "Speed"]]

Out[2]= {0.116392, " there was not to be seen them, and without any fu"}
```

Favor more in-depth exploration taking longer computation time:

```wl
In[3]:= {longertime, betterprediction} = AbsoluteTiming[sp["This is", "NextSequence" -> 50, PerformanceGoal -> "Quality"]]

Out[3]= {8.72925, " the name of the Rueful Countenance,\" said Don Qui"}
```

Compare the results:

```wl
In[4]:= sp[#, "SequenceProbability"]& /@ {quickprediction, betterprediction}

Out[4]= {4.204024334087384`*^-15, 6.145339356552335`*^-10}
```

#### Method (1)

Specify a memory size of 3 for the Markov process trained on the training subsequences:

```wl
In[1]:= sp = SequencePredict[{WikipediaData["computer"]}, Method -> {"Markov", "Order" -> 3}]

Out[1]= SequencePredictorFunction[«1»]
```

### Possible Issues (1)

An empty list is parsed as the list with no sequences inside and will return an empty list:

```wl
In[1]:= sp = SequencePredict[{{[image], [image]}, {[image], [image], [image]}, {[image], [image], [image]}, {[image], [image]}}, {}]

Out[1]= {}
```

To obtain the most likely next element completing an empty sequence, nest it in a second list for disambiguation:

```wl
In[2]:= sp = SequencePredict[{{[image], [image]}, {[image], [image], [image]}, {[image], [image], [image]}, {[image], [image]}}, {{}}]

Out[2]= {[image]}
```

## See Also

* [`Predict`](https://reference.wolfram.com/language/ref/Predict.en.md)
* [`Classify`](https://reference.wolfram.com/language/ref/Classify.en.md)
* [`SequencePredictorFunction`](https://reference.wolfram.com/language/ref/SequencePredictorFunction.en.md)
* [`TimeSeriesModelFit`](https://reference.wolfram.com/language/ref/TimeSeriesModelFit.en.md)
* [`TimeSeriesForecast`](https://reference.wolfram.com/language/ref/TimeSeriesForecast.en.md)
* [`EstimatedProcess`](https://reference.wolfram.com/language/ref/EstimatedProcess.en.md)

## Related Guides

* [Supervised Machine Learning](https://reference.wolfram.com/language/guide/SupervisedMachineLearning.en.md)
* [Machine Learning](https://reference.wolfram.com/language/guide/MachineLearning.en.md)

## History

* [Introduced in 2017 (11.1)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn111.en.md) \| [Updated in 2017 (11.2)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn112.en.md)