Thursday 1 December 2022

[Solution] Task Scheduling and Data Assignment Codeforces Solution

A. Task Scheduling and Data Assignment
time limit per test
15 seconds
memory limit per test
1024 megabytes
standard input
standard output

With the rapid development of chip hardware computing capabilities, hardware architectures are evolving towards more complex multi-core and heterogeneous architectures. In this problem, your goal is to design an efficient task scheduling algorithm together with a memory allocation policy for the software-hardware co-optimization problem that exists widely in operating systems, distributed systems, and compilers.


There are โ„“ tasks that need to be scheduled on n machines with varying processing powers. There are also m disks with varying transmission speeds and storage capacities for storing the data generated by each task after its execution.

Task schedule. A task schedule is an assignment of tasks to machines and disks. More specifically, for each task i, a task schedule specifies three parameters (xi,yi,zi), meaning that i starts at time xi on machine yi and stores its data to disk zi. The earliest starting time of any task is 0.

Task Running Process Here are more details about the running process of a task i when the task schedule is fixed.

  • The processing power of machine j is power(j). Each task i has a size of size(i), which means it needs size(i)/power(yi) units of time to execute when it is run on machine yi. Here, . means rounding up.
  • Each disk k has a read/write transmission rate of speed(k). When a task i finishes its execution, it will generate a piece of data of size data(i) that needs to be stored on some disk. This storing step will take data(i)/speed(zi) units of time if this data is stored on disk zi. Here, the value of time also needs to be rounded up.
  • Before a task i starts its execution, it needs to read the output data of some other tasks. We let PREDdata(i) denote the set of tasks whose output data are needed by task i. This will take a total time of jPREDdata(i)data(j)/speed(zj) and this must be done before task i starts its execution. Note that when calculating the total time, each term needs to be rounded up individually and then summed.

Summarizing the above process, when task i starts, it goes through three phases: reading data from other tasks, executing the task, and storing data. For convenience, we denote the starting time of the three phases as ai,bi,ci and the finishing time of the last phase as di. These values are determined as follows.

  • ai=xi
  • bi=ai+jPREDdata(i)data(j)/speed(zj).
  • ci=bi+size(i)/power(yi).
  • di=ci+data(i)/speed(zi).

In addition, a feasible task schedule should satisfy the following requirements:

  • No preemption: Each machine executes only one task or transmits one data at a time, and no interruption is allowed during its lifecycle. In other words, for each pair of tasks i and j such that yi=yj, the two intervals (ai,di) and (aj,dj) cannot have any overlaps.
  • Task dependencies: Each task i has a list of task-dependent tasks, denoted by PREDtask(i). For each task jPREDtask(i), task i cannot start until j finishes its execution. That is, it must satisfy aicj. Note that if i is scheduled on the same machine with j, then it still needs to wait for j to complete storing its data.
  • Data dependencies: For each task jPREDdata(i), task i cannot start until j finishes storing its data. That is, it must satisfy aidj.
  • Affinity of machines: Each task i has a list of affinitive machines. denoted by Ai, that it can be assigned to. Each task must be scheduled on one of its affinitive machines. That is, it must satisfy yiAi.
  • Disk Capacity: Each disk k has a storage capacity of capacity(k). The total size of all data stored on the same disk cannot exceed that disk's capacity. That is, for each disk k it must satisfy i:zi=kdata(i)capacity(k).

Given a feasible schedule, the makespan of this schedule is the finishing time of the last task. In other words, makespan=max1iโ„“di


Your objective is to find a feasible task schedule that minimizes the makespan.


The first line contains an integer โ„“ representing the number of tasks. In the following โ„“ lines, each line represents a task: The first three integers represent task id i, task size size(i), and the size of its output data data(i). All task ids are distinctive integers between 1 and โ„“. The fourth integer k is the number of affinitive machines for this task, and the remaining k integers are the IDs of these affinitive machines and they are distinctive as well.

Next, a line contains an integer n representing the number of machines. In the following n lines, each line has two integers representing machine id j and its power power(j). All machine ids are distinctive integers between 1 and n.

Next again, a line contains an integer m representing the number of disks. In the following m lines, each line has three integers representing disk id k, its speed speed(k), and capacity capacity(k). All disk ids are distinctive integers between 1 and k.

The remaining part describes the two types of dependencies between tasks.

The first line of this part is an integer N representing the number of data dependencies. In the following N lines, each line contains two integers i,j, which means task j is data-dependent on task i.

Next, there is a new line with an integer M representing the number of task dependencies. In the following M lines, each line contains two integers i,j, which means task j is task-dependent on task i.

All numbers on the same line are separated by whitespaces.

Variable constraints

  • number of tasks: 6l10000
  • number of machines: 1n50
  • number of disks: 10m30
  • size of task: 10size(i)600
  • size of outputted data: 1data(i)20
  • computing power of machines: 1power(i)20
  • capacity of disks: 1capacity1050000
  • rate of disks: 1speed20
  • all the ids are consequent and starts from one.
  • the dependence graph which includes both two types of dependence edge is acyclic.

You should output โ„“ lines describing the task schedule for all tasks. The ith line should contain four integers i,xi,yi,zi, separated by whitespaces, meaning that task i will start at time xi on machine yi and store its data on disk zi.


For each test, the score is your makespan divided by the lower bound, where the lower bound is computed via


outdegree(i) means the number of edges that go out from task i in DAG (In other words, it is the number of tasks that depends on task i).

We guarantee that all the cases have feasible solutions since there is a disk of rate 1 with a capacity greater or equal to the sum of size(i) over all tasks i. If you output an invalid solution, you get zero. Here, makespan stands for the latest end-time of all tasks when the earliest start time of all tasks is zero. The final score is the summation of all tests' scores.


At time 0J1 is scheduled on M2 and stores its output to D2, it costs 40/2 and 6/2 unit time in executing phase and writing phase, respectively. So, it finishes at time 23. Note that the time cost in the reading phase is zero since it does not data-dependent on any task.

At time 23J2 is scheduled on M1 and reads its data from D2 and stores its output to D1. Finally, It costs 6/2+20+6=29 unit time and it finishes at time 52.

At the same time, J3 is scheduled on M2 and reads its data from D2. And it costs 6/2+96/2+10/2 and finishes at 79. Note though J3 and J1 are running on the same machine, J3 still needs to read the J1's outputted data from disk.

At time 52J4 starts on M1 and reads data of J1 and J2, which costs 6/2+6/1 unit time. So the final time cost is (3+6)+20+6 and finishes at time 87.

At time 79J5 starts on M2 and costs 10/2+6 unit time to read data and another 60/2 unit time to execute, and does not output any data. So, it finishes at time 120.

At time 87J6 starts on M1 and finishes at time 118.

So, the final makespan is 120 and it is a feasible solution. Let's verify the feasibility of the above solution.

Dependency constraints: In the solution, J2,J3 start after J1 and J4 starts after J1,J2, and so on.

The affinity of machines: In this case, the only restriction of affinity is that task 6 can only run in machine 1. And in this solution, it DOES.

No preemption: it is obvious.

The quota of disk: (1) The tasks storing data in disk 1 are J2,J4,J5, their total data size is 6+6+0<30; (2) The tasks storing data in disk 2 are J1,J3,J6, their total data size is 6+10<17.

No comments:

Post a Comment