GUPTA MECHANICAL

IN THIS WEBSITE I CAN TELL ALL ABOUT TECH. TIPS AND TRICKS APP REVIEWS AND UNBOXINGS ALSO TECH. NEWS .............

[Solution] Task Scheduling and Data Assignment Codeforces Solution

A. Task Scheduling and Data Assignment
time limit per test
15 seconds
memory limit per test
1024 megabytes
input
standard input
output
standard output

With the rapid development of chip hardware computing capabilities, hardware architectures are evolving towards more complex multi-core and heterogeneous architectures. In this problem, your goal is to design an efficient task scheduling algorithm together with a memory allocation policy for the software-hardware co-optimization problem that exists widely in operating systems, distributed systems, and compilers.

Description

There are $\ell$ tasks that need to be scheduled on $n$ machines with varying processing powers. There are also $m$ disks with varying transmission speeds and storage capacities for storing the data generated by each task after its execution.

Task schedule. A task schedule is an assignment of tasks to machines and disks. More specifically, for each task $i$, a task schedule specifies three parameters $\left({x}_{i},{y}_{i},{z}_{i}\right)$, meaning that $i$ starts at time ${x}_{i}$ on machine ${y}_{i}$ and stores its data to disk ${z}_{i}$. The earliest starting time of any task is 0.

Task Running Process Here are more details about the running process of a task $i$ when the task schedule is fixed.

• The processing power of machine $j$ is $\mathsf{p}\mathsf{o}\mathsf{w}\mathsf{e}\mathsf{r}\left(j\right)$. Each task $i$ has a size of $\mathsf{s}\mathsf{i}\mathsf{z}\mathsf{e}\left(i\right)$, which means it needs $⌈\mathsf{s}\mathsf{i}\mathsf{z}\mathsf{e}\left(i\right)/\mathsf{p}\mathsf{o}\mathsf{w}\mathsf{e}\mathsf{r}\left({y}_{i}\right)⌉$ units of time to execute when it is run on machine ${y}_{i}$. Here, $⌈.⌉$ means rounding up.
• Each disk $k$ has a read/write transmission rate of $\mathsf{s}\mathsf{p}\mathsf{e}\mathsf{e}\mathsf{d}\left(k\right)$. When a task $i$ finishes its execution, it will generate a piece of data of size $\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}\left(i\right)$ that needs to be stored on some disk. This storing step will take $⌈\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}\left(i\right)/\mathsf{s}\mathsf{p}\mathsf{e}\mathsf{e}\mathsf{d}\left({z}_{i}\right)⌉$ units of time if this data is stored on disk ${z}_{i}$. Here, the value of time also needs to be rounded up.
• Before a task $i$ starts its execution, it needs to read the output data of some other tasks. We let $\mathsf{P}\mathsf{R}\mathsf{E}{\mathsf{D}}_{\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}}\left(i\right)$ denote the set of tasks whose output data are needed by task $i$. This will take a total time of $\sum _{j\in \mathsf{P}\mathsf{R}\mathsf{E}{\mathsf{D}}_{\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}}\left(i\right)}⌈\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}\left(j\right)/\mathsf{s}\mathsf{p}\mathsf{e}\mathsf{e}\mathsf{d}\left({z}_{j}\right)⌉$ and this must be done before task $i$ starts its execution. Note that when calculating the total time, each term needs to be rounded up individually and then summed.

Summarizing the above process, when task $i$ starts, it goes through three phases: reading data from other tasks, executing the task, and storing data. For convenience, we denote the starting time of the three phases as ${a}_{i},{b}_{i},{c}_{i}$ and the finishing time of the last phase as ${d}_{i}$. These values are determined as follows.

• ${a}_{i}={x}_{i}$
• ${b}_{i}={a}_{i}+\sum _{j\in \mathsf{P}\mathsf{R}\mathsf{E}{\mathsf{D}}_{\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}}\left(i\right)}⌈\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}\left(j\right)/\mathsf{s}\mathsf{p}\mathsf{e}\mathsf{e}\mathsf{d}\left({z}_{j}\right)⌉$.
• ${c}_{i}={b}_{i}+⌈\mathsf{s}\mathsf{i}\mathsf{z}\mathsf{e}\left(i\right)/\mathsf{p}\mathsf{o}\mathsf{w}\mathsf{e}\mathsf{r}\left({y}_{i}\right)⌉$.
• ${d}_{i}={c}_{i}+⌈\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}\left(i\right)/\mathsf{s}\mathsf{p}\mathsf{e}\mathsf{e}\mathsf{d}\left({z}_{i}\right)⌉$.

• No preemption: Each machine executes only one task or transmits one data at a time, and no interruption is allowed during its lifecycle. In other words, for each pair of tasks $i$ and $j$ such that ${y}_{i}={y}_{j}$, the two intervals $\left({a}_{i},{d}_{i}\right)$ and $\left({a}_{j},{d}_{j}\right)$ cannot have any overlaps.
• Task dependencies: Each task $i$ has a list of task-dependent tasks, denoted by $\mathsf{P}\mathsf{R}\mathsf{E}{\mathsf{D}}_{\mathsf{t}\mathsf{a}\mathsf{s}\mathsf{k}}\left(i\right)$. For each task $j\in \mathsf{P}\mathsf{R}\mathsf{E}{\mathsf{D}}_{\mathsf{t}\mathsf{a}\mathsf{s}\mathsf{k}}\left(i\right)$, task $i$ cannot start until $j$ finishes its execution. That is, it must satisfy ${a}_{i}\ge {c}_{j}$. Note that if $i$ is scheduled on the same machine with $j$, then it still needs to wait for $j$ to complete storing its data.
• Data dependencies: For each task $j\in \mathsf{P}\mathsf{R}\mathsf{E}{\mathsf{D}}_{\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}}\left(i\right)$, task $i$ cannot start until $j$ finishes storing its data. That is, it must satisfy ${a}_{i}\ge {d}_{j}$.
• Affinity of machines: Each task $i$ has a list of affinitive machines. denoted by ${A}_{i}$, that it can be assigned to. Each task must be scheduled on one of its affinitive machines. That is, it must satisfy ${y}_{i}\in {A}_{i}$.
• Disk Capacity: Each disk $k$ has a storage capacity of $\mathsf{c}\mathsf{a}\mathsf{p}\mathsf{a}\mathsf{c}\mathsf{i}\mathsf{t}\mathsf{y}\left(k\right)$. The total size of all data stored on the same disk cannot exceed that disk's capacity. That is, for each disk $k$ it must satisfy $\sum _{i:{z}_{i}=k}\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}\left(i\right)\le \mathsf{c}\mathsf{a}\mathsf{p}\mathsf{a}\mathsf{c}\mathsf{i}\mathsf{t}\mathsf{y}\left(k\right)$.

Given a feasible schedule, the makespan of this schedule is the finishing time of the last task. In other words, $\mathsf{m}\mathsf{a}\mathsf{k}\mathsf{e}\mathsf{s}\mathsf{p}\mathsf{a}\mathsf{n}=\underset{1\le i\le \ell }{max}{d}_{i}$

Objectives

Your objective is to find a feasible task schedule that minimizes the makespan.

Input

The first line contains an integer $\ell$ representing the number of tasks. In the following $\ell$ lines, each line represents a task: The first three integers represent task id $i$, task size $\mathsf{s}\mathsf{i}\mathsf{z}\mathsf{e}\left(i\right)$, and the size of its output data $\mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}\left(i\right)$. All task ids are distinctive integers between $1$ and $\ell$. The fourth integer $k$ is the number of affinitive machines for this task, and the remaining $k$ integers are the IDs of these affinitive machines and they are distinctive as well.

Next, a line contains an integer $n$ representing the number of machines. In the following $n$ lines, each line has two integers representing machine id $j$ and its power $\mathsf{p}\mathsf{o}\mathsf{w}\mathsf{e}\mathsf{r}\left(j\right)$. All machine ids are distinctive integers between $1$ and $n$.

Next again, a line contains an integer $m$ representing the number of disks. In the following $m$ lines, each line has three integers representing disk id $k$, its speed $\mathsf{s}\mathsf{p}\mathsf{e}\mathsf{e}\mathsf{d}\left(k\right)$, and capacity $\mathsf{c}\mathsf{a}\mathsf{p}\mathsf{a}\mathsf{c}\mathsf{i}\mathsf{t}\mathsf{y}\left(k\right)$. All disk ids are distinctive integers between $1$ and $k$.

The remaining part describes the two types of dependencies between tasks.

The first line of this part is an integer $N$ representing the number of data dependencies. In the following $N$ lines, each line contains two integers $i,j$, which means task $j$ is data-dependent on task $i$.

Next, there is a new line with an integer $M$ representing the number of task dependencies. In the following $M$ lines, each line contains two integers $i,j$, which means task $j$ is task-dependent on task $i$.

All numbers on the same line are separated by whitespaces.

Variable constraints

• number of tasks: $6\le l\le 10000$
• number of machines: $1\le n\le 50$
• number of disks: $10\le m\le 30$
• size of task: $10\le \mathsf{s}\mathsf{i}\mathsf{z}\mathsf{e}\left(i\right)\le 600$
• size of outputted data: $1\le \mathsf{d}\mathsf{a}\mathsf{t}\mathsf{a}\left(i\right)\le 20$
• computing power of machines: $1\le \mathsf{p}\mathsf{o}\mathsf{w}\mathsf{e}\mathsf{r}\left(i\right)\le 20$
• capacity of disks: $1\le \mathsf{c}\mathsf{a}\mathsf{p}\mathsf{a}\mathsf{c}\mathsf{i}\mathsf{t}\mathsf{y}\le 1050000$
• rate of disks: $1\le \mathsf{s}\mathsf{p}\mathsf{e}\mathsf{e}\mathsf{d}\le 20$
• all the ids are consequent and starts from one.
• the dependence graph which includes both two types of dependence edge is acyclic.
Output

You should output $\ell$ lines describing the task schedule for all tasks. The $i$th line should contain four integers $i,{x}_{i},{y}_{i},{z}_{i}$, separated by whitespaces, meaning that task $i$ will start at time ${x}_{i}$ on machine ${y}_{i}$ and store its data on disk ${z}_{i}$.

Scoring

For each test, the score is your makespan divided by the lower bound, where the lower bound is computed via

where,

$\mathsf{o}\mathsf{u}\mathsf{t}\mathsf{d}\mathsf{e}\mathsf{g}\mathsf{r}\mathsf{e}\mathsf{e}\left(i\right)$ means the number of edges that go out from task $i$ in DAG (In other words, it is the number of tasks that depends on task $i$).

We guarantee that all the cases have feasible solutions since there is a disk of rate $1$ with a capacity greater or equal to the sum of $\mathsf{s}\mathsf{i}\mathsf{z}\mathsf{e}\left(i\right)$ over all tasks $i$. If you output an invalid solution, you get zero. Here, makespan stands for the latest end-time of all tasks when the earliest start time of all tasks is zero. The final score is the summation of all tests' scores.

Note

At time $0$${J}_{1}$ is scheduled on ${M}_{2}$ and stores its output to ${D}_{2}$, it costs $40/2$ and $6/2$ unit time in executing phase and writing phase, respectively. So, it finishes at time $23$. Note that the time cost in the reading phase is zero since it does not data-dependent on any task.

At time $23$${J}_{2}$ is scheduled on ${M}_{1}$ and reads its data from ${D}_{2}$ and stores its output to ${D}_{1}$. Finally, It costs $6/2+20+6=29$ unit time and it finishes at time $52$.

At the same time, ${J}_{3}$ is scheduled on ${M}_{2}$ and reads its data from ${D}_{2}$. And it costs $6/2+96/2+10/2$ and finishes at $79$. Note though ${J}_{3}$ and ${J}_{1}$ are running on the same machine, ${J}_{3}$ still needs to read the ${J}_{1}$'s outputted data from disk.

At time $52$${J}_{4}$ starts on ${M}_{1}$ and reads data of ${J}_{1}$ and ${J}_{2}$, which costs $6/2+6/1$ unit time. So the final time cost is $\left(3+6\right)+20+6$ and finishes at time $87$.

At time $79$${J}_{5}$ starts on ${M}_{2}$ and costs $10/2+6$ unit time to read data and another $60/2$ unit time to execute, and does not output any data. So, it finishes at time $120$.

At time $87$${J}_{6}$ starts on ${M}_{1}$ and finishes at time $118$.

So, the final makespan is $120$ and it is a feasible solution. Let's verify the feasibility of the above solution.

Dependency constraints: In the solution, ${J}_{2}$,${J}_{3}$ start after ${J}_{1}$ and ${J}_{4}$ starts after ${J}_{1}$,${J}_{2}$, and so on.

The affinity of machines: In this case, the only restriction of affinity is that task 6 can only run in machine 1. And in this solution, it DOES.

No preemption: it is obvious.

The quota of disk: (1) The tasks storing data in disk 1 are ${J}_{2},{J}_{4},{J}_{5}$, their total data size is $6+6+0<30$; (2) The tasks storing data in disk 2 are ${J}_{1},{J}_{3},{J}_{6}$, their total data size is $6+10<17$.