[Codility] Lesson-05.2: GenomicRangeQuery

3 min readDec 12, 2020

In this post, we will find the priority letter and coefficient of them in several DNA subsequences.

MinAvgTwoSlice coding task - Learn to Code - Codility

A non-empty array A consisting of N integers is given. A pair of integers (P, Q), such that 0 ≤ P < Q < N, is called a…

app.codility.com

Task Description

A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?

The DNA sequence is given as a non-empty string S = S[0]S[1]…S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).

For example, consider string S = CAGCCTA and arrays P, Q such that:

P[0] = 2 Q[0] = 4

P[1] = 5 Q[1] = 5

P[2] = 0 Q[2] = 6

The answers to these M = 3 queries are as follows:

The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.
The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.
The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide A whose impact factor is 1, so the answer is 1.

Write a function:

def solution(S, P, Q)

that, given a non-empty string S consisting of N characters and two non-empty arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.

Result array should be returned as an array of integers.

For example, given the string S = CAGCCTA and arrays P, Q such that:

P[0] = 2 Q[0] = 4

P[1] = 5 Q[1] = 5

P[2] = 0 Q[2] = 6

the function should return the values [2, 4, 1], as explained above.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [1..100,000];
M is an integer within the range [1..50,000];
each element of arrays P, Q is an integer within the range [0..N − 1];
P[K] ≤ Q[K], where 0 ≤ K < M;
string S consists only of upper-case English letters A, C, G, T.

Key Point

Find accurately the priority letter in each subsequence via given indices.
Avoid timeout situations. For example, the linear search method consumes much more time when the given DNA sequence is long.

Solution (using Python)

def solution(S, P, Q):
    M, letters = [], {'A':1, 'C':2, 'G':3, 'T':4}
    for idx in range(len(P)):
    for key in list(letters.keys()):
        if(key in S[P[idx]:Q[idx]+1]):
            min_coeff = letters[key]
            break
        M.append(min_coeff)
    return M

Please use the above solution for reference.
I recommend you to write your own source code.

[Codility] Lesson-05.1: CountDiv

This post deals with the problem of counting the number of elements whose remainder is 0 when divided by a specific…

yeonghyeon.medium.com

[Codility] Lesson-05.3: MinAvgTwoSlice

This post handles extracting the slice, subset that shows a minimum average, from the numeric list.